Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have situation as below.
Every day I am getting 256 GB products information from different online shops and content providers (Ex. CNET datasource).
These information can be CSV, XML and TXT files. Files will be parsed and storing into MongoDB.
Later information will be transformed to searchable and indexed into Elasticsearch.
All 256 GB information is not different every day. Mostly 70% information will be same and few fields like Price, Size, Name and etc will be changed frequently.
I am processing Files usig PHP.
My problem are
Parsing huge data
Mapping the fields inside DB ( ex. title is not title for all onlineshops. They will give field name as Short-Title or some other name)
Increasing GB of information every day. How to store all and process. ( may be Bigdata but not sure how to use it)
Searching information fast with huge data.
Please suggest me suitable Database for this problem.
parsing huge data - Spark is fastest distributed solution for your need, thought you have 70% same data just for comparing its duplicate you anyway have to process it, here you can do mapping n all as well.
data store, if you are doing any aggregation here, i would recommend to use HBase/Impala , if each row of product is important to you use cassandra
For serching nothing is faster than lucene, so use use Solr or Elasticsearch whatever you think comfortable, both are good.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I was asked to build an expense report framework which allows users to store their expenses, one at a time, via a web form. The number of entries will never be more than 100-200 per day.
In addition to the date and time, to be provided by the user, there must be a pre-defined set of tags (e.g.: transportation, lodging, food) to choose from for each new row of data, as well as fields for currency, amount and comments.
Afterwards, it must be possible (or rather, easy) to fetch the entries in the db between two dates and store the data in a pandas data frame (or R data table) for posterior statistical analysis and plotting.
I first thought about using PHP to insert the data in a mySQL database table, where the tags would be columns of booleans (True/False). The very simple web form would load by default with all tags set to False and it would be up to the user to turn the right ones to True prior to submission.
This said, I am now wondering about the other approaches I can or should explore. I've been reading about openTSDB and InfluxDB, which are designed to handle massive amounts of data, but I am also interested in hearing from coders up-to-date with the latest technologies about other possible options.
In short, I wish to choose a wise approach which is neither dated nor a (complex) cannon to kill a fly.
You could try Axibase Time-Series Database Community Edition. It's free.
Supports tags for entities, metrics, and series tags
Provides open-source API clients for R, Python, and PHP
Range time-series query is a core use case
Check out App examples you can easily build in PHP, Go, NodeJS. Application code is open source under Apache 2 license and is hosted on github.
Disclosure: I work for the Axibase.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I would like to set up an online store and a point of sale application for a food coop.
My preference is php/mysql, but I can't find any projects which accomplish both these requirements. I was wondering if it would be possible to use separate store and pos apps and get them using the same product database.
The questions I have about this are:
is it a bad idea?
Should one of the apps be modified to use the same tables as the other or should there be a database replication process which maps the fields together (is this a common thing?)
is it a bad idea?
The greatest danger might be that if someone successfully attacks your online store, then the pos systems might get affected as well. E.g. from a DOS attack. That wouldn't keep me from taking this route, though.
Should one of the apps be modified to use the same tables as the other or should there be a database replication process which maps the fields together (is this a common thing?)
If you can get at least one of the two systems to use the products data in read only mode, then I'd set up a number of views to translate between the different schemata without physically duplicating any data.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a system where users can 'like' content. There will likely be many hundreds of these likes going on at once. I'd like it to be AJAX driven so you get an immediate response.
At the moment I have a mysql table of likes which contains the post_id and user_id and I have a 'cached' counter on the posts table with the total number of likes - simple so far.
Would I benefit in any way, from storing any of this information in mongodb to take the load off of mysql?
At the moment, I click like, and two mysql queries run - and INSERT into likes and an UPDATE on posts. If I'm in a large-scale environment in heavy read/write situation what would be the best way to go?
Thanks in advance :)
MySQL isn't a good option for something like this, as a large number of writes will cause scaling issues. I believe MongoDB's real advantage is schemaless JSON document oriented storage, and while it should perform better than MySQL (if set up correctly), I think you should look at using Redis to store counters like this (The single INC command to increase a number value is the cream on top of the cake). It can handle writes much more efficiently than any other database, as per my experience.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Sorry if this has been ask but i can't find anything about this on the form,
I am making a shipping calculator and i get csv files from my courier with rate and place (the calculator is made in php),my question is - what is best to read the CSV file in as an array or import the CSV to Mysql database and read the data that way?
If anyone has some experience with this type of situation and won't mine telling me the best way to go about this that will be so great.
I have not tried anything because i would like to know what the best way is to go about this.
Thanks for reading.
Won't this depend upon how many times a day you need to access the data, and how often the shipping data is updated?
eg if the shipping data is updated daily, and you access it 10000 times per day, then yes it would be worth importing it into a db so you can do your lookups.
(this is the kind of job sqlite was designed for btw).
If the shipping data is updated every minute, then you'd be best grabbing it every time.
If the shipping data is updated daily, and you only access it 10 times, then I wouldn't worry too much - either grab it an cache the file then access it as a PHP array.
Sorry, but I am not familiar with the data feed in question.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am writing small geolocation service: then user come to my site I should to set his town from his IP-address. Now I found three way to solve this problem:
Create from PHP connection to MySql DB and select town from it.
From PHP go to cgi script (perl,c ?) and select town from file with towns and IP-addrs.
Use services like http://ipinfodb.com/ip_location_api.php and get town from it.
But what way would be fastest? Minimal time etc?
Thanks!
3.
Primarily because of just how much data you'd have to manually compile together to do either 1 or 2.
There is no easy answer to it because a lot depends on unknown factors such as:
Speed of your MySQL DB
Speed of your php inplementation and size of the file
Speed of the location_api service
In other words, there are only two ways to find out the answer:
build them all and test
gather all parameters (speeds, bandwidth, concurrent users of all systems) and calculate/guesstimate.
I've used the MaxMind database for country-level lookup from PHP (there is example code for other languages). The downloadable database is in a binary format optimised for speed of reading - although I've not compared it to a import into Mysql and searching with SQL, I have no doubt of Maxmind when they say it would be faster to use the API and original data rather than via another means, like SQL.