PHP/MySQL efficient way to spot specific pattern in data

PHP/MySQL efficient way to spot specific pattern in data - php

I have an arduino grabbing outside light level from an LDR at 1 minute intervals. I am therefore storing the the data for each day as a time series dataset with a percentage light and a timestamp. The data looks like below (although is stored in a mysql db) and produces this graph:
{"Timestamp":"2017-03-22 14:48:48","ExternalLight":"99.5"},{"Timestamp":"2017-03-22 14:47:46","ExternalLight":"99.6"},{"Timestamp":"2017-03-22 14:46:44","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:45:42","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:44:40","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:43:38","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:42:36","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:41:34","ExternalLight":"99.7"},{"Timestamp":"2017-03-22 14:40:32","ExternalLight":"99.6"},{"Timestamp":"2017-03-22 14:39:30","ExternalLight":"99.5"},{"Timestamp":"2017-03-22 14:38:28","ExternalLight":"99.5"},{"Timestamp":"2017-03-22 14:37:26","ExternalLight":"99.6"},{"Timestamp":"2017-03-22 14:36:24","ExternalLight":"99.6"},{"Timestamp":"2017-03-22 14:35:22","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:34:20","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:33:18","ExternalLight":"99.8"},{"Timestamp":"2017-03-22 14:32:16","ExternalLight":"99.7"},{"Timestamp":"2017-03-22 14:31:14","ExternalLight":"99.6"},{"Timestamp":"2017-03-22 14:30:12","ExternalLight":"99.5"},
.......
I am looking for the most efficient way to identify the two specific changes - where it gets light in the morning, and where it gets dark in the evening. Would it be possible to do this using a MySQL query? Or will I need to select all of the data and process it using PHP? I am not really sure the best way to start so I am looking for some guidence!
Many thanks,
Chris

The short answer is that this is a data analysis problem and neither MySQL nor PHP are good fits. I generally wouldn't suggest trying to do something like this in PHP. I seriously doubt it is even possible in MySQL. A language that is designed more for data analysis and processing would work much better. Personally, I use python for these kinds of tasks, which has excellent tools like numpy/scipy/matplotlib. What did you make your plot in? If that is an actual programming language, that might be a good choice.
The thing to do is to figure out the algorithm that you will use to measure these things, and then figure out how to implement that algorithm in your language of choice. The reality is that the question you are asking is much more complicated than it might seem at first glance. Taking a time series and building a algorithm to reliably extract information like "sunrise" and "sunset" can be surprisingly involved, especially when you account for things like variable weather conditions, etc. This is probably a fairly straight-forward problem to start with, but what you really need (from the sounds of it), is not help selecting a technology (i.e. MySQL or PHP), but help building an actual algorithm. This may or may not be the best place for that. Have you done any data analysis before?

Related

Is it faster to dig data using sql_fetch_array or having a csv string and separate the values using php?

I'm working on a website that is supposed to show the specifications of computers, for example: Cpu, cpu speed, ram and stuff like that. There are a lot of fields and I wonder whether the php's mysql_fetch_array would be faster or saving all the data in one varchar field and separating it using php? also wondering if there are any pros and cons to either of them?
I'm using php 5.3 and mySQL
Thanks in advance.

If I were you I'll use A database but Everything in separates field and tables (DB relational) with a Entity relationship Diagram + normalization. Doing that you will have scalability, Independence, avoid redundancy, beside if you add good INDEX (PK,FK and UK) you could have a good performance. Of course that will also depend of your queries and your PHP code. But that's a good practice to implement the right DB design plus security stuff.
The pros are A LOT... the cons, It will take you more time but it will worth it.

I doubt the speed difference is ever going matter. What will matter is whether or not you have a regular structure and the ability to freely query your data. Put it in the database & figure out how to speed things up later when/if it ever becomes an issue, rather than making poor design decisions up front because you think you might need more performance.

Should I use a more constrained SQL query or handle the result set in code?

I'm a pretty new PHP developer yet have some good experience with T-SQL so I'm more than capable of building me some nice SQL queries. I'm building a PHP app and have ran into this issue on a number of occasions.
When I'm against a problem where I need to pull data from a MySQL database and process the output via PHP should I simply "*" out the whole result set and pick out what I need via PHP? IMO it makes more sense to put as much work in SQL as possible but I dunno.
In regards of performance and "best practice" what is generally best; rely on MySQL to do the big part of the work and let PHP pick up the output or let PHP do the majority of the work?

You should generally never apply the wild-card to grab all columns. It is far better practice to only retrieve the columns you are interested in using. This has implications on:
Performance: less overhead by only fetching and retrieving data that is required
Maintainability: it is much more obvious what your query is fetching. I.e., if you add additional columns to your structure you might be getting data you originally didn't expect.

If you only don't need all of the columns from the table, specify the ones you want explicitly. This is better performance-wise, not to mention that it's more readable.

SQL will almost always produce a better performance over PHP, but it really depends on the query being run. The more complex it is, the more likely SQL will be a better answer. SQL was built around 'getting you the right data quickly' whereas PHP was built around being a general purpose scripting language.

In general it's considered best practice to return specific columns. Performance is not a huge concern, unless you're dealing with tables that contain BLOBs (also consider if the table might have one in the future), but you might find it easier to deal with smaller return objects.
Of course, if you're in development mode, rather than maintenance mode, you might find SELECT * preferable if you're frequently adding or removing columns.

In my experience the access time to gather a * query or a specific field query is dwarfed by the over all request time, at least if this is done over the web.
IE in my testing with websites hosted both in the US and overseas I generally see that the difference between select fld,fld,fld and select * is negligible unless you have large table rows and lots of rows.
I personally think this is somewhat of a judgement call based on the size of your system, tables and if your tables and columns are in any kind of a state of flux. If I were you, not knowing the size of your system I'd go with select * but clearly it is not the "best practice" but I think it really needs to be on a specific system-by-system basis.
I'm currently a member of a team on a HUGE C#, WCF, WPF, SQL project and everywhere they went with the "best practices" and what we have is a huge mess. In some cases so much effort was spent to follow the best practice that the result has been far more work to maintain it. We have interfaces by the bazillions as well as wrappers, adapters, unity etc. Right now to make one small change it touches 15 - 20 files. I realize this is slightly off topic but don't blindly follow the best practice just because its there.

Learning about MySql for large database/tables?

I've been working on a new site of mine for a couple of days now which will be retrieving almost all of its most used content from a MySql database. Seeming as the Database and website is still under development the tables are really small at the moment and speed is of no concern yet.
But you know what they say, a little bit of hard work now saves you a headache later on.
Now I'm only 17, the only database I've ever been taught was through Microsoft Access, and we were practically given the database completed - we learned up to 3NF, but that was about it.
I remember reading once when I was looking to pull data (randomly) out of a database how large databases were taking several seconds/minutes to complete a single query, so this just got me thinking. In a fraction of a second I can submit a search to google, google processes the query and returns the result, and then my browser renders it - all done in the blink of an eye. And google has billions of records to search through. And they're also doing this for millions of users simultaneously.
I'm thinking, how do they do it? I know that they have huge data centers, but still.
I realize that it probably comes down to the design of the database, how it's been optimized, and obviously the configuration. And I guess that's my question really. Could someone please tell me how to design high performance databases for millions/billions of rows (yes, I'm being optimistic), and possibly point me towards some good reading material to help me learn further?
Also, all my queries are done via PHP, if that's at all relevant to any answers.

The blog http://highscalability.com/ has some good articles and pointers to how companies handle large problems.
Specifically related to MySQL, you can Google for craigslist.org's use of MySQL.
http://www.slideshare.net/jzawodn/mysql-and-search-at-craigslist

First the good news... MySQL scales well (depending on the hardware) to at least hundreds of millions of rows.
Once you get to a certain point, a single database server will have trouble managing the load. That's when you get into the realm of partitioning or sharding... spreading the load across multiple database servers using any one of a number of different schemes (e.g. putting unrelated tables on different servers, spreading a single table across multiple servers e.g. by using the ID or date range as a partitioning key).
SQL does shard, but is not fundamentally designed to shard well. There's a whole category of storage alternatives collectively referred to as NoSQL that are designed to solve that very problem (MongoDB, Cassandra, HBase are a few).
When you use SQL at very large scale, you run into any number of issues such as making data model changes across a DB server farm, trouble keeping up with data backups, etc. That's a very complex topic, and people that solve it well are rare. For a glimpse at the issues, have a look at http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse-than-death/
When selecting a database platform for a specific project, benchmark the solution early and often to understand whether or not it will meet the performance requirements that you envision. Having a framework to do that will help you learn about scalability, and will help you decide whether to invest effort in improving the data storage part of your solution, and will help you know where best to invest your time.

No one can tell how to design databases. It comes after much reading and many hour working on them. A good design is product of many many years doing them though. As you've only seen Access you got no knowledge of databases. Search through Amazon.com and you'll get tons of titles. For someone that's starting, anyone will do it.
I mean no disrespect. I've been there and I'm also tutor of some people learning programming/database design. I do know that there's no silver bullet or shortcuts for the work you have ahead.
If you intend to work with high performance database, you should have something in mind. The design of them in per application. A good design depends on learning more and more how the app's users interact with the system, the usage patterns, etc. The things you'll learn from books will give you options, using them will depend heavily on the scenario.
Good luck!

It doesn't all come down to the design of the database, though that is indeed a big part of it. The guys who made Google are geniouses, and if I'm not completely wrong about Google you won't be able to find out exactly how they do what they do. Also, I know that years back they had more than 10,000 computers processing queries, and today they probably have many more. I also suspect them for caching most of the recent/popular keywords. And all the websites have been indexed and analyzed using an unknown algorithm which will make sure the computers don't have to look through all the words on every page.
In fact, Google crawls the entire internet around every 14 days, so when you do a search you do not search the entire internet. Your search gets broken down into keywords and then these keywords are used to narrow the number of relevant pages - and I'm pretty sure all pages have already been analyzed for important and/or relevant keywords before you even thought of visiting google.com.
Have a look at this question.

Have a look into Sphinx server.
http://sphinxsearch.com/
Craigslist uses that for their search engine. Basically, you give it a source and it indexes whatever you want (mysql database/table, text files, etc.). If it works for craigslist, it should work for you.

where should I do the calculating stuff,PHP or Mysql?

I've been doing a lot of calculating stuff nowadays. Usually I prefer to do these calculations in PHP rather than MySQL though I know PHP is not good at this. I thought MySQL may be worse. But I found some performance problem: some pages were loaded so slowly that 30 seconds' time limit is not enough for them! So I wonder where is the better place to do the calculations, and any principles for that? Suggestions would be appreciated.

Anything that can be done using a RDBMS (GROUPING, SUMMING, AVG) where the data can be filtered on the server side, should be done in the RDBMS.
If the calculation would be better suited in PHP then fine, go with that, but otherwise don't try to do in PHP what a RDBMS was made for. YOU WILL LOSE.

I would recommend doing any row level calculations using the RDBMS.
Not only are you going to benefit from better performance but it also makes your applications more portable if you need to switch to another scripting language, let's say PHP to Python, because you've already sorted, filtered and processed the data using your RBDMS.
It also helps separate your application logic, it has helped me keep my controllers cleaner and neater when working in an MVC environment.

i would say do calculations in languages that were created for that, like c++. But if you choose between mysql and php, php is better.

Just keep track of where your bottlenecks are. If your table gets locked up because you're trying to run some calculations, everyone else is in queue waiting to read/write the data in the selected tables and the queue will continue to grow.
MySQL is typically faster at processing your commands, but PHP should be able to handle simple problems without too much of a fuss. Of course, that does not mean you should be pinging your database multiple times for the same calculation over and over.
You might be better off caching your results if you can and have a cron job updating it once a day/hour (please don't do it every minute, your hosting provider will probably hate you).

Do as much filtering and merging as possible to bring the minimum amount of data into php. Once you have that minimum data set, then it depends on what you are doing, server load, and perhaps other factors.
If you can do something equally well in either, and the sql is not overly complex to write (and maintain) then do that. For simple math, sql is usually a good bet. For string manipulations where the strings will end up about the same length or grow, php is probably a good bet.
The most important thing is to request as little data as possible. The manipulation of the data, at least what sql can do, is secondary to retrieving and transferring the data.

Native MySQL functions are very quick. So do what makes sense in your queries.
If you have multiples servers (ie, a web server and a DB server), note DB servers are much more expensive then web servers, so if you have a lot of traffic or a very busy DB server do not do the 'extras' that can be handled just as easily/efficiently on a web server machine to help prevent slowdowns.

cmptrgeekken is right we would need some more information. BUT if you are needing to do calculations that pertain to database queries or doing operations on them, comparisons certian fields from the database, make the database do it. Doing special queries in SQL is cheape r(as far as time is concerned and it is optimized for that) But both PHP and MySQL are both server side it won't really matter where you are doing the calculations. But like I said before if they are operations on with database information, make a more complicated SQL query and use that.

Use PHP, don't lag up your MySQL doing endless calculations. If your talking about things like sorting its OK to use MySQL for stuff like that, SUM, AVG but don't overdo it.

Faster to query in MYSQL or to use PHP logic

I have a page that will pull many headlines from multiple categories based off a category id.
I'm wondering if it makes more sense to pull all the headlines and then sort them out via PHP if/ifelse statements or it is better to run multiple queries that each contain the headlines from each category.

Why not do it in one query? Something like:
SELECT headline FROM headlines WHERE category_id IN (1, 2, 3, ...);
If you filter your headlines in PHP, think how many you'll be throwing away. If you end up with removing just 10% of the headlines, it won't matter as much as when you'd be throwing away 90% of the results.

These kinds of questions are always hard to answer because the situation determines the best course. There is never a truly correct answer, only better ways. In my experience doesn't really matter whether you attempt to do the work in PHP or in the database because you should always try to cache the results of any expensive operation using a caching engine such as memcached. That way you are not going to spend a lot of time in the db or in php itself since the results will be cached and ready instantaneously for use. When it comes down to it, unlss you profile your application using a tool like xDebug, what you think are your performance bottlenecks are just guesses.

It's usually better not to overload the DB, because you might cause a bottleneck if you have many simultaneous queries.
However, handling your processing in PHP is usually better, as Apache will fork threads as it needs to handle multiple requests.
As usual, it all comes down to: "How much traffic is there?"

MySQL can already do the selecting and ordering of the data for you. I suggest to be lazy and use this.
Also I'd look for a (1) query that fetches all the categories and their headlines at once. Would an ORDER BY category, publishdate or something do?

Every trip to the database costs you something. Returning extra data that you then decide to ignore costs you something. So you're almost certainly better to let the database do your pruning.
I'm sure one could come up with some case where deciding what data you need makes the query hugely complex and thus difficult for the database to optimize, while you could do it in your code easily. But if we're talking about "select headline from story where category='Sports'" followed by "select headline from story where category='Politics'" then "select headline from story where category='Health'" etc, versus "select category, headline from story where category in ('Health','Sports','Politics')", the latter is clearly better.

On the topic of "Faster to query in MYSQL or to use PHP logic", which is how I ended up on this question 10 years later. I have determined that the correct answer is "it depends". There are just too many examples where using the DB saves processing time over writing PHP Code.... but there are just as many examples where writing PHP Code saves time on excessively complex MySQL queries.
There is no right answer here. If you end up here, like I did, then the best I can suggest is try to solve your problem with the skills that you have. Start first with the Query and try to solve it, if you run into issues then start thinking about just gathering the data and running the logic through PHP code to come up with a solution.
At the end of the day, you need to solve a problem.... if you solve it, but its not fast enough, then thats another problem... work on optimizing which may end up meaning that you go back to writing more MySQL logic.
Use the 80/20 rule and try to get things 80% of the way there as quickly as possible. You can go back and optimize once its workable. Spending all your effort on making it perfect the first time will surely mean you miss your deadline.
Thats my $0.02

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.