Thanks for taking an interest in this post, basically I'm looking for some advice on working with data from different database implementations within PHP, or if PHP isn't suitable for these tasks, any recommendations regarding other approaches.
The task I would like to accomplish can be illustrated with the following example. I have a MySQL database where I store demographic information regarding users organised by user_id on 'server A', this runs to about 200,000 rows. On server B, I have users usage data stored by user_id and event_id in a Vertica database that runs to about 300,000,000 rows.
I would like to find a way to join these datasets so I can produced summarised output consisting of aggregated user events taken from the Vertica database grouped by data contained in the MySQL database such as age and location, through a join on the 'user_id' field.
I realise that this could be accomplished by creating a copy of either of these tables on the other server but I'm curious if this can be achieved without it.
My questions are:
Can PHP do operations like this? if so a link to an example would be really welcome.
Do you need to load the data into arrays and join there? Can you join arrays in PHP like tables in a database? Can PHP handle large arrays like this?
Are there any other approaches that I should be considering instead?
Thanks in advance for any help,
James
I suggest using TALEND ! is an open source ETL tool that has Mysql and Vertica connector implemented into it.
You can aggregate data sets from any rdbms as long as TALEND has access to them ! and then dump them where you need.
Give it a try .
Related
My stack is php and mysql.
I am trying to design a page to display details of a mutual fund.
Data for a single fund is distributed over 15-20 different tables.
Currently, my front-end is a brute-force php page that queries/joins these tables using 8 different queries for a single scheme. It's messy and poor performing.
I am considering alternatives. Good thing is that the data changes only once a day, so I can do some preprocessing.
An option that I am considering is to create run these queries for every fund (about 2000 funds) and create a complex json object for each of them, store it in mysql indexed for the fund code, retrieve the json at run time and show the data. I am thinking of using the simple json_object() mysql function to create the json, and json_decode in php to get the values for display. Is this a good approach?
I was tempted to store them in a separate MongoDB store - would that be an overkill for this?
Any other suggestion?
Thanks much!
To meet your objective of quick pageviews, your overnight-run approach is very good. You could generate JSON objects with your distilled data, or even prerendered HTML pages, and store them.
You can certainly store JSON objects in MySQL columns. If you don't need the database server to search the objects, simply use TEXT (or LONGTEXT) data types to store them.
To my way of thinking, adding a new type of server (mongodb) to your operations to store a few thousand JSON objects does not seem worth the the trouble. If you find it necessary to search the contents of your JSON objects, another type of server might be useful, however.
Other things to consider:
Optimize your SQL queries. Read up: https://use-the-index-luke.com and other sources of good info. Consider your queries one-by-one starting with the slowest one. Use the EXPLAIN or even the EXPLAIN ANALYZE command to get your MySQL server to tell you how it plans each query. And judiciously add indexes. Using the query-optimization tag here on StackOverflow, you can get help. Many queries can be optimized by adding indexes to MySQL without changing anything in your php code or your data. So this can be an ongoing project rather than a big new software release.
Consider measuring your query times. You can do this with MySQL's slow query log. The point of this is to identify your "dirty dozen" slowest queries in a particular time period. Then, see step one.
Make your pages fill up progressively, to keep your users busy reading while you get the data they need. Put the toplevel stuff (fund name, etc) in server-side HTML so search engines can see it. Use some sort of front-end tech (React, maybe, or Datatables that fetch data via AJAX) to render your pages client-side, and provide REST endpoints on your server to get the data, in JSON format, for each data block in the page.
In your overnight run create a sitemap file along with your JSON data rows. That lets you control exactly how you want search engines to present your data.
I know that there is an abundance of articles and questions to related topic and I Have been researching them from a few days now and I am more confused then ever.
I am newbie to website world and I learned and almost fully developed a website using MySQL, PHP and JavaScript. I was adding search feature using elastic search and I got confused if I should stick with MySQL and use it in conjunction with ElasticSearch or should I change to any NoSQL db?
Presently I have interconnected tables like users (this table has lots of optional fields) and customers(which has a column for different role of customer), articles, likes, rating, comments and appointment tables.
The data in these tables might need to change later, and I do require filtering and personalisation. I would be using an external payment gateway.
I have no experience in noSql or newSql. I fear that if I change to noSql the data might be a mess. Also, I would like to realise it quickly. That being said, it would be easier to switch now then later as I don't have any data stored.
Should I change the type of database? If yes to which and can I do that by just changing my sql queries in php to no SQL (how to do that)? How much of the coding will I have to change and how to do it efficiently? Or could/should I add data for personalisation to ElasticSearch?
Thanks for your help and please bear with me if this seems like an obvious choice!
Problem statement: I am working on a application in which a user can follow other users (like twitter or other e-commerce sites) and get their updates on his wall.It is in relation to a merchant and a user. A user can follow any merchant.The user himself can be a merchant,so actually its like a user following other users(Many-many realtion).
Issue: The easiest way to go about it was to have a junction table which will have
id (auto-increment) | follower_user_id | followed_user_id. But I am not sure when the database grows vertically,how well will it scale.If a user follows 100 people there would be 100 entries for a single user.In that case if I want to get the followers of any user it would take longer time for the query to execute.
Research: i tried studying twitter and other websites and DB designs,but they use different databases like graph based Nosql etc to solve their problems.In our case its Mysql.I also went about using caching mechanism but I would like to know,if there is any way I could store the values horizontally i.e each user has his followers in a single row(comma separated would be tedious as I tried it).
Can I have a separate databse for this feature something like Nosql based database (mongo etc). What impact would it have on performnce in different cases?
if my approach of going with the easiset way is right how can I improve the performance for say 5-10k users(looking at a small base now)?Would basic mysql queries work well?
Please help me with inputs over the same.
The system I use (my personal preference) is to add a 2 columns on the users, following and followers and store a simple encrypted json array in it with the ID's of followers and the users that are following..
Only drawback is that when querying you have to decrypt it then json_decode it but it has worked fine for me for almost 2 years.
After going through the comments and doing some research I came to the conclusion that it would be better I go the normal way of creating the followers table and do some indexing and use caching mechanism for it.
Indexing as suggested composite indexes would work well
For caching I am planning to use Memcache!
I am building a web app, and I am thinking about how I should build the database.
The app will be feed by keywords, then it will retrieve info for those keywords and save it into the database with a datestamp. The info will be from different source like, num of results from yahoo, diggs from the last month that contains that keyword, etc.
So I was thinking the a simple way to do it would be to have a table with an id and keyword column where the keywords would be stored, and another table for ALL the data with a id(same as keyword), datestamp, data_name, data_content.
Is this a good way to use mysql or could this in someway make queries slower or sometihng? should I build tables for each type of data I want to use? I am mostly looking for a good performance on the application.
Another reason I would like to use only one table for the data is that I can easly add more data_name(s) without touching the db.
The second table i.e, the table which contains various information about keywords ,id column in this table can be used as a foreign key to the first table id column
Is this a good way to use mysql or
could this in someway make queries
slower or sometihng? should I build
tables for each type of data I want to
use? I am mostly looking for a good
performance on the application.
A lot of big MySQL players(like for example flickr) use MySQL as a simple KV(key-value) store.
Furthermore if you are concerned with performance you should cache your data in memcached/redis(there is nothing which can beat memory).
I have another recommendation: index your content. If you plan on storing content and be able to search using keywords, then use mysql to store the details of your document, like author, text and some other info and use Lucene to create an index. Lucene is originally for Java, but has ports to many languages, and PHP is no exception.
You can use Zend framework to manage Lucene indexes, with very little effort, browse thru the documentation or look for a tutorial online. The thing of this recommendation is simple:
- You'll improve your search time drastically
- Your keyword acceptance will be higher, Lucene will give power to search
I hope I can help!
Best luck!
I have some several codes in PHP to do jobs that MySQL could do.
such as sorting, merging each data from different MySQL tables, etc...
Lately, I found out that I can do all these stuffs with one MySQL query.
I am wondering is it better to give the MySQL capable jobs to MySQL or to PHP.
efficiencies, speed, etc..
Thank You,
If you do it in PHP you are just re-implementing the features that MySQL already has. It's far from the most optimized solution and therefore it is much slower.
You should definately do it in the SQL query.
Your performance will increase if you let MySQL handle that work.
It will be better performing to do this in MySQL. Firstly, it has optimized sorting algorithms for the data and can utilize indexes which are created. Furthermore, if it is merging and filtering, you will end up transfering less data from the database.
Databases are optimized to carry out these functions while retrieving the data. Sorting at database level is much more easier to read than writing tens of line for coding in PHP over the lists or collection
There are ready String functions available in MySQL to merge the data while retrieving the data from the database.
I definitely would suggest MySQL.
DO it in MySQL. There's no question that is more efficient. PHP will use much more memory, for one.
No question: MySQL is built for this.
To add something, maybe you'd be intrested in building joint table queries (multiple table queries). It is very helpful and really very simple. For instance:
$query = "SELECT DISTINCT post.title as title, post.id as id,
product.imageURL as imageURL, product.dueDate as dueDate
FROM post, product
WHERE post.status='saved'
AND post.productURL=product.linkURL
AND post.userEmail='$session[userEmail]'
AND NOT EXISTS(
SELECT publication.postId FROM publication
WHERE publication.postId=post.id
)
ORDER BY post.id";
This is a simple example from some code i built.
The thing is it merges 2 different tables with the restriction of post.productURL=product.linkURL. It also uses negation, pretty useful when the set you are looking for is not defined by any condition but instead the absence of one.
You can avoid this by building views in MySQL as well.
I'm a newbie myself, so I hope it helps. Cheers.