Split data received into separate fields

Split data received into separate fields - php

Problem : Data supplied is not ok for me to handle directly.
Description : I'm using curl to get data I need from another website.
However the data supplied on this website is formatted in a way that I need to split it to be able to work with it.
Example:
I'm scraping a div (soccer competition), but the match is supplied in one field, I need to have the 2 teams separated into 2 fields.
MysqlDB Field holds after the scrape:
team 1 - team 2
I need to separate that DB info into 2 columns
preg_match would be able to do this, but there are many opponents, so its a hassle like this.
Right now, I added (manually which is a pain) a team_id column and added myteam to the games which involve my team to the db table and did a where team_id = myteam
This way I can at least sort the competition table and just get my teams games.
This is the data I'm talking about getting with curl:
$value['Wedstrijd'] = htmlentities($value['Wedstrijd'], ENT_QUOTES);
So right now I'm scraping the field and putting it in the db.
I'm wondering, how can I separate the content of $value['Wedstrijd'] into 2 writes.....
Is this even possible?
I'm not able to post the entire code here, the formatting gets screwed.
Using substr and strrpos I was able to get the first team out the field,
I thought setting it to -1 would give me the other way around, which it actually does, but it doesn't go back till the - symbol, it just gives me 1 letter/symbol then.
substr($wedstrijd,0,strrpos($wedstrijd,'-')); this would return the first team, but I'm not sure how to use this to get the team after the - symbol.

use strpos + substr or preg_match

Related

Web based parts inventory design

I am a field service technician and I have an inventory of parts that is either issued to me by the company I work for or through orders for specific jobs. I am trying to design a website to manage my parts, both on-hand inventory and parts that have been returned or transferred to someone else. Here is the information I need to track:
part number(10 digit)
req number(8 digit, unique)
description(up to 50 characters)
location(Van or shed).
WorkOrder("w"+9 digits ex: 'W212141234')
BOL(15 digit bill of lading #)
TransferDate(date I get rid of part)
TransferMethod(enum 'DEF','RTS','OBF')
I will probably use PHP to make a website and interact with the MySQL database.
What is the best design? A multi-table approach or one table with webpages that display queries of only certain fields? I need a list of on hand parts that list part number, req number, description, and location. I will also need to be able to have "defective returns" view that will list what parts I returned as DEF with all the remaining fields filled in.
Besides the "on hand" fields, the rest of the fields won't have data until they are no longer "on hand".
I really appreciate any help because I am new to both SQL and PHP. I have experimented with Ruby on Rails and django but I am not sure if I need to tackle all that at this point.

Even though you give some information on your issue, it is hard to actually approach it as the question itself on "what is the best design" is vague.
What I would do is this:
MYSQL TABLE DESIGN
Table parts
req number(int(8), unique, KEY)
part number(int(10))
description(varchar(50))
location(enum 'Van','shed')
WorkOrder(varchar(10))
BOL(varchar(15))
TransferDate(date)
TransferMethod(enum 'DEF','RTS','OBF')
onhand (boolean)
PHP SCRIPTS
and then i would make 2 php scripts with a single query each and a table displaying the info
onhand.php
select *fields filled for on hand parts* from parts where onhand = 1
notonhand.php
select *fields filled for not on hand parts* from parts where onhand = 0

When to use comma-separated values in a DB Column?

OK, I know the technical answer is NEVER.
BUT, there are times when it seems to make things SO much easier with less code and seemingly few downsides, so please here me out.
I need to build a Table called Restrictions to keep track of what type of users people want to be contacted by and that will contain the following 3 columns (for the sake of simplicity):
minAge
lookingFor
drugs
lookingFor and drugs can contain multiple values.
Database theory tells me I should use a join table to keep track of the multiple values a user might have selected for either of those columns.
But it seems that using comma-separated values makes things so much easier to implement and execute. Here's an example:
Let's say User 1 has the following Restrictions:
minAge => 18
lookingFor => 'Hang Out','Friendship'
drugs => 'Marijuana','Acid'
Now let's say User 2 wants to contact User 1. Well, first we need to see if he fits User 1's Restrictions, but that's easy enough EVEN WITH the comma-separated columns, as such:
First I'd get the Target's (User 1) Restrictions:
SELECT * FROM Restrictions WHERE UserID = 1
Now I just put those into respective variables as-is into PHP:
$targetMinAge = $row['minAge'];
$targetLookingFor = $row['lookingFor'];
$targetDrugs = $row['drugs'];
Now we just check if the SENDER (User 2) fits that simple Criteria:
COUNT (*)
FROM Users
WHERE
Users.UserID = 2 AND
Users.minAge >= $targetMinAge AND
Users.lookingFor IN ($targetLookingFor) AND
Users.drugs IN ($targetDrugs)
Finally, if COUNT == 1, User 2 can contact User 1, else they cannot.
How simple was THAT? It just seems really easy and straightforward, so what is the REAL problem with doing it this way as long as I sanitize all inputs to the DB every time a user updates their contact restrictions? Being able to use MySQL's IN function and already storing the multiple values in a format it will understand (e.g. comma-separated values) seems to make things so much easier than having to create join tables for every multiple-choice column. And I gave a simplified example, but what if there are 10 multiple choice columns? Then things start getting messy with so many join tables, whereas the CSV method stays simple.
So, in this case, is it really THAT bad if I use comma-separated values?
****ducks****

You already know the answer.
First off, your PHP code isn't even close to working because it only works if user 2 has only a single value in LookingFor or Drugs. If either of these columns contains multiple comma-separated values then IN won't work even if those values are in the exact same order as User 1's values. What do expect IN to do if the right-hand side has one or more commas?
Therefore, it's not "easy" to do what you want in PHP. It's actually quite a pain and would involve splitting user 2's fields into single values, writing dynamic SQL with many ORs to do the comparison, and then doing an extremely inefficient query to get the results.
Furthermore, the fact that you even need to write PHP code to answer such a relatively simple question about the intersection of two sets means that your design is badly flawed. This is exactly the kind of problem (relational algebra) that SQL exists to solve. A correct design allows you to solve the problem in the database and then simply implement a presentation layer on top in PHP or some other technology.
Do it correctly and you'll have a much easier time.

Suppose User 1 is looking for 'Hang Out','Friendship' and User 2 is looking for 'Friendship','Hang Out'
Your code would not match them up, because 'Friendship','Hang Out' is not in ('Hang Out','Friendship')
That's the real problem here.

Two-part MySQL question: Accessing specific MySQL row, and column performance

I have a table with about 150 websites listed in it with the columns "site_name", "visible_name" (basically a formatted name), and "description." For a given page on my site, I want to pull site_name and visible_name for every site in the table, and I want to pull all three columns for the selected site, which comes from the $_GET array (a URL parameter).
Right now I'm using 2 queries to do this, one that says "Get site_name and visible_name for all sites" and another that says "Get all 3 fields for one specific site." I'm guess a better way to do it is:
SELECT * FROM site_list;
thus reducing to 1 query, and then doing the rest post-query, which brings up 2 questions:
The "description" field for each site is about 200-300 characters. Is it bad from a performance standpoint to pull this for all 150 sites if I'm only using it for 1 site?
How do I reference the specific row from the MySQL result set for the site specificed in the URL? For example, if the URL is "mysite.com/results?site_name=foo" how do I do the post-query equivalent of SELECT * FROM site_list where site_name=foo; ?
I don't know how to get the data for "site_name=foo" without looping through the entire result array and checking to see if site_name matches the URL parameter. Isn't there a more efficient way to do it?
Thanks,
Chris
PS: I noticed a similar question on stackoverflow and read through all the answers but it didn't help in my situation, which is why I'm posting this.
Thanks,
Chris

I believe what you do now, keeping sperated queries for getting a list of sites with just titles and one detailed view with description for a single given site, is good. You don't pull any unneeded data and both queries being very simple are fast.
It is possible to combine both your queries into one, using left join, something maybe like:
SELECT s1.site_name, s1.visible_name, s2.description
FROM site_list s1
LEFT JOIN
( SELECT site_name, description
FROM site_list
WHERE site_name = 'this site should go with description' ) s2
ON s2.site_name = s1.site_name
resulting in all sites without matching name having NULL as description, you could even sort it using
ORDER BY description DESC, site_name
to get the site with description as first fetched row, thus eliminating need to iterate through results to find it, but mysql would have to do a lot more work to give you this result, negating any possible gain you could hope for. So basically stick to what you have now, its good.

Generally, it's good practice to have an 'id' field in the table as an auto_increment value. Then, you would:
SELECT id,url,display_name FROM table;
and you'd have the 'id' value there to later:
SELECT * FROM table WHERE id=123;
That's probably your most efficient method if you had WAAAY more entries in the table.
However, with only 150 rows in the table, you're probably just fine doing
SELECT * FROM table;
and only accessing that last field for a matching row based on your criteria.

If you only need the description for the site named foo you could just query the database with SELECT * FROM site_list WHERE site_name = 'foo' LIMIT 1
Otherwise you would have to loop though the result array and do a string comparison on site_name to find the correct description.

Using explode, split, or preg_split to store and get multiple database entries

I'm trying to figure out how and which is best for storing and getting multiple entries into and from a database. Either using explode, split, or preg_split. What I need to achieve is a user using a text field in a form to either send multiple messages to different users or sharing data with multiple users by enter their IDs like "101,102,103" and the PHP code to be smart enough to grab each ID by picking them each after the ",". I know this is asking a lot, but I need help from people more skilled in this area. I need to know how to make the PHP code grab IDs and be able to use functions with them. Like grabbing "101,102,103" from a database cell and grabbing different stored information in the database using the IDs grabbed from that one string.
How can I achieve this? Example will be very helpful.
Thanks

If I understand your question correctly, if you're dealing with comma delimited strings of ID numbers, it would probably be simplest to keep them in this format. The reason is because you could use it in your SQL statement when querying the database.
I'm assuming that you want to run a SELECT query to grab the users whose IDs have been entered, correct? You'd want to use a SELECT ... WHERE IN ... type of statement, like this:
// Get the ids the user submitted
$ids = $_POST['ids'];
// perform some sanitizing of $ids here to make sure
// you're not vulnerable to an SQL injection
$sql = "SELECT * FROM users WHERE ID IN ($ids)";
// execute your SQL statement
Alternatively, you could use explode to create an array of each individual ID, and then loop through so you could do some checking on each value to make sure it's correct, before using implode to concatenate them back together into a string that you can use in your SELECT ... WHERE IN ... statement.
Edit: Sorry, forgot to add: in terms of storing the list of user ids in the database, you could consider either storing the comma delimited list as a string against a message id, but that has drawbacks (difficult to do JOINS on other tables if you needed to). Alternatively, the better option would be to create a lookup type table, which basically consists of two columns: messageid, userid. You could then store each individual userid against the messageid e.g.
messageid | userid
1 | 1
1 | 3
1 | 5
The benefit of this approach is that you can then use this table to join other tables (maybe you have a separate message table that stores details of the message itself).
Under this method, you'd create a new entry in the message table, get the id back, then explode the userids string into its separate parts, and finally create your INSERT statement to insert the data using the individual ids and the message id. You'd need to work out other mechanisms to handle any editing of the list of userids for a message, and deletion as well.
Hope that made sense!

Well, considering the three functions you suggested :
explode() will work fine if you have a simple pattern that's always the same.
For instance, always ', ', but never ','
split() uses POSIX regex -- which are deprecated -- and should not be used anymore.
preg_split() uses a regex as pattern ; and, so, will accept more situations than explode().
Then : do not store several values in a single database column : it'll be impossible to do any kind of useful work with that !
Create a different table to store those data, with a single value per row -- having several rows corresponding to one line in the first table.

I think your problem is more with SQL than with PHP.
Technically you could store ids into a single MySQL field, in a 'set' field and query against it by using IN or FIND_IN_SET in your conditions. The lookups are actually super fast, but this is not considered best practice and creates a de-normalized database.
What is nest practice, and normalized, is to create separate relationship tables. So, using your example of messages, you would probably have a 'users' table, a 'messages' table, and a 'users_messages' table for relating messages between users. The 'messages' table would contain the message information and maybe a 'user_id' field for the original sender (since there can only be one), and the 'users_messages' table would simply contain a 'user_id' and 'message_id' field, containing rows linking messages to the various users they belong to. Then you just need to use JOIN queries to retrieve the data, so if you were retrieving a user's inbox, a query would look something like this:
SELECT
messages.*
FROM
messages
LEFT JOIN users_messages ON users_messages.message_id = messages.message_id
WHERE
users_messages.user_id = '(some user id)'

Database structure for saving search results

I currently work for a social networking website.
My boss recently had the idea to show search results by random instead of normal results (registration date). The problem with that is simple and obvious: if you go from one page to another, it's going to show you different results each time as the list is randomized each time.
I had the idea to store results in database+cookies something like this:
Cookie containing a serialized version of the $_POST request (needed if we want to do a re-sort)
A table which would serve as the base for the search id => searches (id,user_id, creation_date)
A table which would store the results and their order => searches_results (search_id, order, user_id)
Flow chart would look like something like that:
After each searches I store the "where" into a cookie or session
Then I erase the previous search in "searches"
Then I delete previous results in "searches_results"
Then I insert a row into "searches" for the key
Then I insert each user row into "searches_results"
And finally I redirect the user to somethink like ?search_id=[search_key]
There is a big flaw here : performances .... it is definetly possible to make the system OR down OR very slow.
Any idea what would be the best to structure this ?

What if instead of ordering randomly, you ordered by some function where the order is known and repeatable, just non-obvious? You could seed such a function with some data from the search query to make it be even less obvious that it repeats. This way, you can page back and forth through your results and always get what you expect. Music players use this sort of function for their shuffle feature (so that if you click back, you get the previous song, and if you click next again, you're back where you started). I'm sure you can divine some function to accomplish this... bitwise XORing ID values with some constant (from the query) and then sorting by the resulting number might be sufficient. I chose XOR arbitrarily because it's a trivially simple function that will get you repeatable and non-obvious results.

Hum maybe, but doesn't the xor operator only say if it is an OR exclusive ? I mean, there is no mathematical operation here, as far as I know of tho.

Sorry, I know this doesn't help, but I don't understand why your boss would want this?
I know that if I search for a person on a social network, then I want the results to be ordered by relevance and relevance only. I think that randomized results would just be frustrating for the user, but maybe that's just me.
For example, if I search for "John Smith", then first first batch of results better be people named "John Smith". Then show me similar names near the end of the results. I don't want to search for "John Smith" and get "Jon Smithers" as my second result.

Well, I'm with Matt in asking "Why?"
I think rmeador has a good suggestion as well. You could randomly sort by a different field or some sort of algorithm. Just from the permutations of DESC / ASC on last updated or some other result field.
Other option would be to do an initial search the first time and return only related ID's and then store the full ID's string in the database and each subsequent page is then a lookup against those ID's.
My two cents.
I can see a scenario where a randomized result set is useful but not for searching but for browsing profiles or artists or local events. It offers more exposure to those that wouldn't show up in a traditionally directed search.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.