Multi-cURL 5000 URLs - php

I need to check for broken images with db entries. So now I am selecting all the items from table and using CURL to check it is broken or not. I have almost 5000 items in DB and CURL is taking lot of time. For one result, it is showing the total time as 0.07 seconds. My table structure is the following :
+----+----------------------------------------+
| id | image_url |
+----+----------------------------------------+
| 1 | http://s3.xxx.com/images/imagename.gif |
| 2 | http://s3.xxx.com/images/imagename.gif |
| 3 | http://s3.xxx.com/images/imagename.gif |
| 4 | http://s3.xxx.com/images/imagename.gif |
+----+----------------------------------------+
So is there any other idea to check for broken images?. I think I cannot use LIMITS here as I need to check for all items and then print the result. I have user file_get_contents() but it is also taking lot of time.

What you can do here is the following:
Use multi_curl to cURL the images in parallell.
Specify header only (as you're not interested in the image data) and if the status code is anything but 200 OK (or 302/Found), then the image does not exist.
Chunk the 5000 items first, don't run them all with multi_curl. About 50-100 items at a time is fine.

Related

Footable or Datatables with 10000 rows

I need to show the table with 15 columns, wich have about 10000 rows. When I load simply table without Limit, It loads too slow. Because at first there shows all rows, and after that Datatable hides rows. Maybe there are some solutions for this case?
But I need that it shows all pages count for example:
| < Previous | 1 | 2 | 3 | 4 | 5 | ... | 690 | Next > |
Now I'm using simple foreach ($res as $r){ echo "<tr><td>$r->id</td></tr>"; }
In addition, I would like to add that I have in first column script for popup like:
echo <td>'.$rowid.'</td>;
If you work with Datatables with large data like in your case you need to load data like this
https://datatables.net/examples/server_side/simple.html
and you can also add this option
https://datatables.net/reference/option/deferRender
When returning this amount of rows you should probably be using the Ajax functionality of datatables. More info can be found here:
https://datatables.net/examples/data_sources/ajax.html

Importing large CSV files in MySQL using Laravel

I have a csv file that can range from 50k to over 100k rows of data.
I'm currently using Laravel w/ Laravel Forge, MySQL, and Maatwebsite Laravel Excel package.
This is to be used by an end-user and not myself so I have created a simple form on my blade view as such:
{!! Form::open(
array(
'route' => 'import.store',
'class' => 'form',
'id' => 'upload',
'novalidate' => 'novalidate',
'files' => true)) !!}
<div class="form-group">
<h3>CSV Product Import</h3>
{!! Form::file('upload_file', null, array('class' => 'file')) !!}
</div>
<div class="form-group">
{!! Form::submit('Upload Products', array('class' => 'btn btn-success')) !!}
</div>
{!! Form::close() !!}
This then stores the file on the server successfully and I'm now able to iterate through the results using something such as a foreach loop.
Now here are the issues I'm facing in chronological order and fixes/attempts:
(10k rows test csv file)
[issue] PHP times out.
[remedy] Changed it to run asynchronously via a job command.
[result] Imports up to 1500 rows.
[issue] Server runs out of memory.
[remedy] Added a swap drive of 1gb.
[result] Imports up to 3000 rows.
[issue] Server runs out of memory.
[remedy] Turned on chunking results of 250 rows each chunk.
[result] Imports up to 5000 rows.
[issue] Server runs out of memory.
[remedy] Removed some tranposing/joined tables logic.
[result] Imports up to 7000 rows.
As you can see the results are marginal and nowhere near 50k, I can barely even make it near 10k.
I've read up and looked into possible suggestions such as:
Use a raw query to run Load Data Local Infile.
Split files before importing.
Store on server then have server split into files and have a cron process them.
Upgrade my 512mb DO droplet to 1gb as a last resort.
Going with load data local infile may not work because my header columns could change per file that's why I have logic to process/iterate through them.
Splitting files before importing is fine under 10k but for 50k or more? That would be highly impractical.
Store on server and then have the server split it and run them individually without troubling the end-user? Possibly but not even sure how to achieve this in PHP yet just only briefly read about that.
Also to note, my queue worker is set to timeout in 10000 seconds which is also very impractical and bad-practice but seems that was the only way it will keep running before memory takes a hit.
Now I can give-in and just upgrade the memory to 1gb but I feel at best it may jump me to 20k rows before it fails again. Something needs to process all these rows quickly and efficiently.
Lastly here is a glimpse of my table structure:
Inventory
+----+------------+-------------+-------+---------+
| id | profile_id | category_id | sku | title |
+----+------------+-------------+-------+---------+
| 1 | 50 | 51234 | mysku | mytitle |
+----+------------+-------------+-------+---------+
Profile
+----+---------------+
| id | name |
+----+---------------+
| 50 | myprofilename |
+----+---------------+
Category
+----+------------+--------+
| id | categoryId | name |
+----+------------+--------+
| 1 | 51234 | brakes |
+----+------------+--------+
Specifics
+----+---------------------+------------+-------+
| id | specificsCategoryId | categoryId | name |
+----+---------------------+------------+-------+
| 1 | 20 | 57357 | make |
| 2 | 20 | 57357 | model |
| 3 | 20 | 57357 | year |
+----+---------------------+------------+-------+
SpecificsValues
+----+-------------+-------+--------+
| id | inventoryId | name | value |
+----+-------------+-------+--------+
| 1 | 1 | make | honda |
| 2 | 1 | model | accord |
| 3 | 1 | year | 1998 |
+----+-------------+-------+--------+
Full CSV Sample
+----+------------+-------------+-------+---------+-------+--------+------+
| id | profile_id | category_id | sku | title | make | model | year |
+----+------------+-------------+-------+---------+-------+--------+------+
| 1 | 50 | 51234 | mysku | mytitle | honda | accord | 1998 |
+----+------------+-------------+-------+---------+-------+--------+------+
So a quick run-through of my logic workflow as simple as possible would be:
Load file into Maatwebsite/Laravel-Excel and iterate through a chunked loop
Check if category_id and sku are empty else ignore and log error to an array.
Lookup category_id and pull all relevant column fields from all related tables it uses and then if no null insert into the database.
Generate a custom title using more logic using the fields available on the file.
Rinse and repeat.
Lastly export the errors array into a file and log it into a database for download to view errors at the end.
I hope someone can share with me some insight on some possible ideas on how I should tackle this while keeping in mind of using Laravel and also that it's not a simple upload I need to process and put into different related tables per line else I'd load data infile it all at once.
Thanks!
You seem to have already figured out the logic for interpreting the CSV lines and converting them to insert queries on the database, so I will focus on the memory exhaustion issue.
When working with large files in PHP, any approach that loads the entire file to memory will either fail, became unbearably slow or require a lot more RAM than you Droplet has.
So my advices are:
Read the file line by line using fgetcsv
$handle = fopen('file.csv', 'r');
if ($handle) {
while ($line = fgetcsv($handle)) {
// Process this line and save to database
}
}
This way only one row at a time will be loaded to memory. Then, you can process it, save to the database, and overwrite it with the next one.
Keep a separate file handle for logging
Your server is short on memory, so logging errors to an array may not be a good idea as all errors will be kept in it. That can become a problem if your csv has lots of entries with empty skus and category ids.
Laravel comes out of the box with Monolog and you can try to adapt it to your needs. However, if it also ends up using too much resources or not fitting your needs, a simpler approach may be the solution.
$log = fopen('log.txt', 'w');
if (some_condition) {
fwrite($log, $text . PHP_EOL);
}
Then, at the end of the script you can store the log file wherever you want.
Disable Laravel's query log
Laravel keeps all your queries stored in memory, and that's likely to be a problem for your application. Luckily, you can use the disableQueryLog method to free some precious RAM.
DB::connection()->disableQueryLog();
Use raw queries if needed
I think it's unlikely that you will run out of memory again if you follow these tips, but you can always sacrifice some of Laravel's convenience to extract that last drop of performance.
If you know your way around SQL, you can execute raw queries to the database.
Edit:
As for the timeout issue, you should be running this code as a queued task as suggested in the comments regardless. Inserting that many rows WILL take some time (specially if you have lots of indexes) and the user shouldn't be staring at an unresponsive page for that long.

PHP how to execute a script with SOAP requests closing browser windows

I developed a PHP script that makes some calls to a SOAP web service.
When I get results I modify data and append them in a CSV file.
I have a MySQL table FOUNDS that contains all my founds ID that use to query the web service
FOUNDS
| ID | status |
| AADR | ok |
| AAIT | ok |
| AAXJ | pending |
| ACIM | pending |
I wrote a PHP page that reads from FOUNDS table with a 10 rows LIMIT and query on soap service, get the results, write them into the CSV, and flag the row as "ok".
After looping on retrieved rows the script check if I have other "pending" rows.
If the total number of "pending" rows is greater than zero I do a PHP location to itself. Basically the page reloads itself. Otherwise it makes a PHP location to an other page.
When we started I had only 100 ID's in FOUNDS tables.
Now I have 40.000 rows in the FOUNDS table and the process takes like 8/10 hours to complete.
How can I modify my script in a way that I don't need to keep my browser open and without a timeout?

PHP Sum from Access using Array

I have a question involving PHP arrays and adding up the values.
I have an access database with information in this scheme
----------------------
| time | code |
|------|-------------|
| 600 | broke down |
| 500 | broke down |
| 300 | waiting |
| 200 | waiting |
| 400 | remove coil |
Anyways, you get the idea. I have multiple code values, but can have multiple time values for one code. What I am trying to accomplish with PHP is to add up all time values and only display one code value.
The result I want would be:
1100 | Brokedown (600 & 500 added together)
500 | Waiting (300 & 200 added together)
400 | Remove Coil
Just for example. I think I should be using a multidimensional array, but I just cannot seem to wrap my head around what to do exactly with it. A point in the right direction would be greatly appreciated.
Why not SQL?
SELECT Code, Sum([Time])
FROM Table
GROUP BY Code
Time is a reserved word, so I guess it is an imaginary name. If it is real, you must enclose it in square brackets, or better, change it.
You can actually use SQL for this to return what you're looking for.
SELECT SUM(time), code FROM table_name GROUP BY code

How to effectively execute this cron job?

I have a table with 200 rows. I'm running a cron job every 10 minutes to perform some kind of insert/update operation on the table. The operation needs to be performed only on 5 rows at a time every time the cron job runs. So in first 10 mins records 1-5 are updated, records 5-10 in the 20th minute and so on.
When the cron job runs for the 20th time, all the records in the table would have been updated exactly once. This is what is to be achieved at least. And the next cron job should repeat the process again.
The problem:
is that, every time a cron job runs, the insert/update operation should be performed on N rows (not just 5 rows). So, if N is 100, all records would've been updated by just 2 cron jobs. And the next cron job would repeat the process again.
Here's an example:
This is the table I currently have (200 records). Every time a cron job executes, it needs to pick N records (which I set as a variable in PHP) and update the time_md5 field with the current time's MD5 value.
+---------+-------------------------------------+
| id | time_md5 |
+---------+-------------------------------------+
| 10 | 971324428e62dd6832a2778582559977 |
| 72 | 1bd58291594543a8cc239d99843a846c |
| 3 | 9300278bc5f114a290f6ed917ee93736 |
| 40 | 915bf1c5a1f13404add6612ec452e644 |
| 599 | 799671e31d5350ff405c8016a38c74eb |
| 56 | 56302bb119f1d03db3c9093caf98c735 |
| 798 | 47889aa559636b5512436776afd6ba56 |
| 8 | 85fdc72d3b51f0b8b356eceac710df14 |
| .. | ....... |
| .. | ....... |
| .. | ....... |
| .. | ....... |
| 340 | 9217eab5adcc47b365b2e00bbdcc011a | <-- 200th record
+---------+-------------------------------------+
So, the first record(id 10) should not be updated more than once, till all 200 records are updated once - the process should start over once all the records are updated once.
I have some idea on how this could be achieved, but I'm sure there are more efficient ways of doing it.
Any suggestions?
You could use a Red/Black system (like for cluster management).
Basically, all your rows start out as black. When you run your cron, it will mark the rows it updated as "Red". Once all the rows are red, you switch, and now start turning all the red rows to be black. You keep this alternation going, and it should allow you to effectively mark rows so that you do not update them twice. (You could store whatever color goal you want in a file or something so that it is shared between crons)
I would just run the PHP script every 10/5 minutes with cron, and then use PHP's time and date functions to perform the rest of the logic. If you cannot time it, you could store a position marking variable in a small file.

Categories