This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
I've tried to figure this out for awhile now. I'm trying to not only get the contents of a webpage using file_get_contents($url) but also be able to draw out specific data.
A webpage I was interested in grabbing the contents of, is Craiglist. This is just one example. I'd like to form an array of states with areas and accompanying websites, but I cant seem to find the way to get the specific elements of the page. Any help would be really appreciated!
Try to use domdocument
$html = 'Assume this is html that you get';
$dom = new DOMDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table'); // Sample to get table element
$rows = $tables->item(0)->getElementsByTagName('tr'); // sample to get rows of the table element
Here is the description
Have a look at the DOMDocument class in PHP.
<?php
$doc = new DOMDocument();
$doc->loadHTML("<html><body>Test<br></body></html>");
echo $doc->saveHTML();
?>
Use http://simplehtmldom.sourceforge.net/
Example getting all link from http://www.craigslist.org/about/sites
include 'simple_html_dom.php';
$url = "http://www.craigslist.org/about/sites";
$html = file_get_html ( $url );
echo "<pre>";
foreach ( $html->find ( 'a' ) as $element ) {
$link = $element->href;
$link = ltrim ( $link, "/" );
if (!preg_match ( "/http/i", $link )) {
$link = $url . $link;
}
echo $link . PHP_EOL;
flush ();
}
Output
http://www.craigslist.org/about/sites#US
http://www.craigslist.org/about/sites#CA
http://www.craigslist.org/about/sites#EU
http://www.craigslist.org/about/sites#ASIA
http://www.craigslist.org/about/sites#OCEANIA
http://www.craigslist.org/about/sites#LATAM
http://www.craigslist.org/about/sites#AF
http://www.craigslist.org/about/sites
http://auburn.craigslist.org
http://bham.craigslist.org
http://dothan.craigslist.org
http://shoals.craigslist.org
http://gadsden.craigslist.org
http://huntsville.craigslist.org
http://mobile.craigslist.org
http://montgomery.craigslist.org
http://tuscaloosa.craigslist.org
http://anchorage.craigslist.org
http://fairbanks.craigslist.org
http://kenai.craigslist.org
http://juneau.craigslist.org
http://flagstaff.craigslist.org
http://mohave.craigslist.org
http://phoenix.craigslist.org
http://prescott.craigslist.org
http://showlow.craigslist.org
http://sierravista.craigslist.org
http://tucson.craigslist.org
http://yuma.craigslist.org
http://fayar.craigslist.org
http://fortsmith.craigslist.org
http://jonesboro.craigslist.org
http://littlerock.craigslist.org
http://texarkana.craigslist.org
http://bakersfield.craigslist.org
http://chico.craigslist.org
http://fresno.craigslist.org
http://goldcountry.craigslist.org
http://hanford.craigslist.org
http://humboldt.craigslist.org
http://imperial.craigslist.org
http://inlandempire.craigslist.org
http://losangeles.craigslist.org
http://mendocino.craigslist.org
http://merced.craigslist.org
http://modesto.craigslist.org
http://monterey.craigslist.org
http://orangecounty.craigslist.org
http://palmsprings.craigslist.org
http://redding.craigslist.org
http://sacramento.craigslist.org
http://sandiego.craigslist.org
http://sfbay.craigslist.org
http://slo.craigslist.org
http://santabarbara.craigslist.org
http://santamaria.craigslist.org
http://siskiyou.craigslist.org
http://stockton.craigslist.org
http://susanville.craigslist.org
http://ventura.craigslist.org
http://visalia.craigslist.org
http://yubasutter.craigslist.org
http://boulder.craigslist.org
http://cosprings.craigslist.org
http://denver.craigslist.org
http://eastco.craigslist.org
http://fortcollins.craigslist.org
http://rockies.craigslist.org
http://pueblo.craigslist.org
http://westslope.craigslist.org
http://newlondon.craigslist.org
http://hartford.craigslist.org
http://newhaven.craigslist.org
http://nwct.craigslist.org
http://delaware.craigslist.org
http://washingtondc.craigslist.org
http://daytona.craigslist.org
http://keys.craigslist.org
http://fortlauderdale.craigslist.org
http://fortmyers.craigslist.org
http://gainesville.craigslist.org
http://cfl.craigslist.org
http://jacksonville.craigslist.org
http://lakeland.craigslist.org
http://lakecity.craigslist.org
http://ocala.craigslist.org
http://okaloosa.craigslist.org
http://orlando.craigslist.org
http://panamacity.craigslist.org
http://pensacola.craigslist.org
http://sarasota.craigslist.org
http://miami.craigslist.org
http://spacecoast.craigslist.org
http://staugustine.craigslist.org
http://tallahassee.craigslist.org
http://tampa.craigslist.org
http://treasure.craigslist.org
http://westpalmbeach.craigslist.org
http://albanyga.craigslist.org
http://athensga.craigslist.org
http://atlanta.craigslist.org
http://augusta.craigslist.org
http://brunswick.craigslist.org
http://columbusga.craigslist.org
http://macon.craigslist.org
http://nwga.craigslist.org
http://savannah.craigslist.org
http://statesboro.craigslist.org
http://valdosta.craigslist.org
http://honolulu.craigslist.org
http://boise.craigslist.org
http://eastidaho.craigslist.org
http://lewiston.craigslist.org
http://twinfalls.craigslist.org
http://bn.craigslist.org
http://chambana.craigslist.org
http://chicago.craigslist.org
http://decatur.craigslist.org
http://lasalle.craigslist.org
http://mattoon.craigslist.org
http://peoria.craigslist.org
http://rockford.craigslist.org
http://carbondale.craigslist.org
http://springfieldil.craigslist.org
http://quincy.craigslist.org
http://bloomington.craigslist.org
http://evansville.craigslist.org
http://fortwayne.craigslist.org
http://indianapolis.craigslist.org
http://kokomo.craigslist.org
http://tippecanoe.craigslist.org
http://muncie.craigslist.org
http://richmondin.craigslist.org
http://southbend.craigslist.org
http://terrehaute.craigslist.org
http://ames.craigslist.org
http://cedarrapids.craigslist.org
http://desmoines.craigslist.org
http://dubuque.craigslist.org
http://fortdodge.craigslist.org
http://iowacity.craigslist.org
http://masoncity.craigslist.org
http://quadcities.craigslist.org
http://siouxcity.craigslist.org
http://ottumwa.craigslist.org
http://waterloo.craigslist.org
http://lawrence.craigslist.org
http://ksu.craigslist.org
http://nwks.craigslist.org
http://salina.craigslist.org
http://seks.craigslist.org
http://swks.craigslist.org
http://topeka.craigslist.org
http://wichita.craigslist.org
http://bgky.craigslist.org
http://eastky.craigslist.org
http://lexington.craigslist.org
http://louisville.craigslist.org
http://owensboro.craigslist.org
http://westky.craigslist.org
http://batonrouge.craigslist.org
http://cenla.craigslist.org
http://houma.craigslist.org
http://lafayette.craigslist.org
http://lakecharles.craigslist.org
http://monroe.craigslist.org
http://neworleans.craigslist.org
http://shreveport.craigslist.org
http://maine.craigslist.org
http://annapolis.craigslist.org
http://baltimore.craigslist.org
http://easternshore.craigslist.org
http://frederick.craigslist.org
http://smd.craigslist.org
http://westmd.craigslist.org
http://boston.craigslist.org
http://capecod.craigslist.org
http://southcoast.craigslist.org
http://westernmass.craigslist.org
http://worcester.craigslist.org
http://annarbor.craigslist.org
http://battlecreek.craigslist.org
http://centralmich.craigslist.org
http://detroit.craigslist.org
http://flint.craigslist.org
http://grandrapids.craigslist.org
http://holland.craigslist.org
http://jxn.craigslist.org
http://kalamazoo.craigslist.org
http://lansing.craigslist.org
http://monroemi.craigslist.org
http://muskegon.craigslist.org
http://nmi.craigslist.org
http://porthuron.craigslist.org
http://saginaw.craigslist.org
http://swmi.craigslist.org
http://thumb.craigslist.org
http://up.craigslist.org
http://bemidji.craigslist.org
http://brainerd.craigslist.org
http://duluth.craigslist.org
http://mankato.craigslist.org
http://minneapolis.craigslist.org
http://rmn.craigslist.org
http://marshall.craigslist.org
http://stcloud.craigslist.org
http://gulfport.craigslist.org
http://hattiesburg.craigslist.org
http://jackson.craigslist.org
http://meridian.craigslist.org
http://northmiss.craigslist.org
http://natchez.craigslist.org
http://columbiamo.craigslist.org
http://joplin.craigslist.org
http://kansascity.craigslist.org
http://kirksville.craigslist.org
http://loz.craigslist.org
http://semo.craigslist.org
http://springfield.craigslist.org
http://stjoseph.craigslist.org
http://stlouis.craigslist.org
http://billings.craigslist.org
http://bozeman.craigslist.org
http://butte.craigslist.org
http://greatfalls.craigslist.org
http://helena.craigslist.org
http://kalispell.craigslist.org
http://missoula.craigslist.org
http://montana.craigslist.org
http://grandisland.craigslist.org
http://lincoln.craigslist.org
http://northplatte.craigslist.org
http://omaha.craigslist.org
http://scottsbluff.craigslist.org
http://elko.craigslist.org
http://lasvegas.craigslist.org
http://reno.craigslist.org
http://nh.craigslist.org
http://cnj.craigslist.org
http://jerseyshore.craigslist.org
http://newjersey.craigslist.org
http://southjersey.craigslist.org
http://albuquerque.craigslist.org
http://clovis.craigslist.org
http://farmington.craigslist.org
http://lascruces.craigslist.org
http://roswell.craigslist.org
http://santafe.craigslist.org
http://albany.craigslist.org
http://binghamton.craigslist.org
http://buffalo.craigslist.org
http://catskills.craigslist.org
http://chautauqua.craigslist.org
http://elmira.craigslist.org
http://fingerlakes.craigslist.org
http://glensfalls.craigslist.org
http://hudsonvalley.craigslist.org
http://ithaca.craigslist.org
http://longisland.craigslist.org
http://newyork.craigslist.org
http://oneonta.craigslist.org
http://plattsburgh.craigslist.org
http://potsdam.craigslist.org
http://rochester.craigslist.org
http://syracuse.craigslist.org
http://twintiers.craigslist.org
http://utica.craigslist.org
http://watertown.craigslist.org
http://asheville.craigslist.org
http://boone.craigslist.org
http://charlotte.craigslist.org
http://eastnc.craigslist.org
http://fayetteville.craigslist.org
http://greensboro.craigslist.org
http://hickory.craigslist.org
http://onslow.craigslist.org
http://outerbanks.craigslist.org
http://raleigh.craigslist.org
http://wilmington.craigslist.org
http://winstonsalem.craigslist.org
http://bismarck.craigslist.org
http://fargo.craigslist.org
http://grandforks.craigslist.org
http://nd.craigslist.org
http://akroncanton.craigslist.org
http://ashtabula.craigslist.org
http://athensohio.craigslist.org
http://chillicothe.craigslist.org
http://cincinnati.craigslist.org
http://cleveland.craigslist.org
http://columbus.craigslist.org
http://dayton.craigslist.org
http://limaohio.craigslist.org
http://mansfield.craigslist.org
http://sandusky.craigslist.org
http://toledo.craigslist.org
http://tuscarawas.craigslist.org
http://youngstown.craigslist.org
http://zanesville.craigslist.org
http://lawton.craigslist.org
http://enid.craigslist.org
http://oklahomacity.craigslist.org
http://stillwater.craigslist.org
http://tulsa.craigslist.org
http://bend.craigslist.org
http://corvallis.craigslist.org
http://eastoregon.craigslist.org
http://eugene.craigslist.org
http://klamath.craigslist.org
http://medford.craigslist.org
http://oregoncoast.craigslist.org
http://portland.craigslist.org
http://roseburg.craigslist.org
http://salem.craigslist.org
http://altoona.craigslist.org
http://chambersburg.craigslist.org
http://erie.craigslist.org
http://harrisburg.craigslist.org
http://lancaster.craigslist.org
http://allentown.craigslist.org
http://meadville.craigslist.org
http://philadelphia.craigslist.org
http://pittsburgh.craigslist.org
http://poconos.craigslist.org
http://reading.craigslist.org
http://scranton.craigslist.org
http://pennstate.craigslist.org
http://williamsport.craigslist.org
http://york.craigslist.org
http://providence.craigslist.org
http://charleston.craigslist.org
http://columbia.craigslist.org
http://florencesc.craigslist.org
http://greenville.craigslist.org
http://hiltonhead.craigslist.org
http://myrtlebeach.craigslist.org
http://nesd.craigslist.org
http://csd.craigslist.org
http://rapidcity.craigslist.org
http://siouxfalls.craigslist.org
http://sd.craigslist.org
http://chattanooga.craigslist.org
http://clarksville.craigslist.org
http://cookeville.craigslist.org
http://jacksontn.craigslist.org
http://knoxville.craigslist.org
http://memphis.craigslist.org
http://nashville.craigslist.org
http://tricities.craigslist.org
http://abilene.craigslist.org
http://amarillo.craigslist.org
http://austin.craigslist.org
http://beaumont.craigslist.org
http://brownsville.craigslist.org
http://collegestation.craigslist.org
http://corpuschristi.craigslist.org
http://dallas.craigslist.org
http://nacogdoches.craigslist.org
http://delrio.craigslist.org
http://elpaso.craigslist.org
http://galveston.craigslist.org
http://houston.craigslist.org
http://killeen.craigslist.org
http://laredo.craigslist.org
http://lubbock.craigslist.org
http://mcallen.craigslist.org
http://odessa.craigslist.org
http://sanangelo.craigslist.org
http://sanantonio.craigslist.org
http://sanmarcos.craigslist.org
http://bigbend.craigslist.org
http://texoma.craigslist.org
http://easttexas.craigslist.org
http://victoriatx.craigslist.org
http://waco.craigslist.org
http://wichitafalls.craigslist.org
http://logan.craigslist.org
http://ogden.craigslist.org
http://provo.craigslist.org
http://saltlakecity.craigslist.org
http://stgeorge.craigslist.org
http://burlington.craigslist.org
http://charlottesville.craigslist.org
http://danville.craigslist.org
http://fredericksburg.craigslist.org
http://norfolk.craigslist.org
http://harrisonburg.craigslist.org
http://lynchburg.craigslist.org
http://blacksburg.craigslist.org
http://richmond.craigslist.org
http://roanoke.craigslist.org
http://swva.craigslist.org
http://winchester.craigslist.org
http://bellingham.craigslist.org
http://kpr.craigslist.org
http://moseslake.craigslist.org
http://olympic.craigslist.org
http://pullman.craigslist.org
http://seattle.craigslist.org
http://skagit.craigslist.org
http://spokane.craigslist.org
http://wenatchee.craigslist.org
http://yakima.craigslist.org
http://charlestonwv.craigslist.org
http://martinsburg.craigslist.org
http://huntington.craigslist.org
http://morgantown.craigslist.org
http://wheeling.craigslist.org
http://parkersburg.craigslist.org
http://swv.craigslist.org
http://wv.craigslist.org
http://appleton.craigslist.org
http://eauclaire.craigslist.org
http://greenbay.craigslist.org
http://janesville.craigslist.org
http://racine.craigslist.org
http://lacrosse.craigslist.org
http://madison.craigslist.org
http://milwaukee.craigslist.org
http://northernwi.craigslist.org
http://sheboygan.craigslist.org
http://wausau.craigslist.org
http://wyoming.craigslist.org
http://micronesia.craigslist.org
http://puertorico.craigslist.org
http://virgin.craigslist.org
http://www.craigslist.org/about/sites
http://calgary.craigslist.ca
http://edmonton.craigslist.ca
http://ftmcmurray.craigslist.ca
http://lethbridge.craigslist.ca
http://hat.craigslist.ca
http://peace.craigslist.ca
http://reddeer.craigslist.ca
http://cariboo.craigslist.ca
http://comoxvalley.craigslist.ca
http://abbotsford.craigslist.ca
http://kamloops.craigslist.ca
http://kelowna.craigslist.ca
http://cranbrook.craigslist.ca
http://nanaimo.craigslist.ca
http://princegeorge.craigslist.ca
http://skeena.craigslist.ca
http://sunshine.craigslist.ca
http://vancouver.craigslist.ca
http://victoria.craigslist.ca
http://whistler.craigslist.ca
http://winnipeg.craigslist.ca
http://newbrunswick.craigslist.ca
http://newfoundland.craigslist.ca
http://territories.craigslist.ca
http://yellowknife.craigslist.ca
http://halifax.craigslist.ca
http://barrie.craigslist.ca
http://belleville.craigslist.ca
http://brantford.craigslist.ca
http://chatham.craigslist.ca
http://cornwall.craigslist.ca
http://guelph.craigslist.ca
http://hamilton.craigslist.ca
http://kingston.craigslist.ca
http://kitchener.craigslist.ca
http://londonon.craigslist.ca
http://niagara.craigslist.ca
http://ottawa.craigslist.ca
http://owensound.craigslist.ca
http://peterborough.craigslist.ca
http://sarnia.craigslist.ca
http://soo.craigslist.ca
http://sudbury.craigslist.ca
http://thunderbay.craigslist.ca
http://toronto.craigslist.ca
http://windsor.craigslist.ca
http://pei.craigslist.ca
http://montreal.craigslist.ca
http://quebec.craigslist.ca
http://saguenay.craigslist.ca
http://sherbrooke.craigslist.ca
http://troisrivieres.craigslist.ca
http://regina.craigslist.ca
http://saskatoon.craigslist.ca
http://whitehorse.craigslist.ca
http://www.craigslist.org/about/sites
http://vienna.craigslist.at
http://brussels.craigslist.org
http://bulgaria.craigslist.org
http://zagreb.craigslist.org
http://prague.craigslist.cz
http://copenhagen.craigslist.org
http://helsinki.craigslist.fi
http://bordeaux.craigslist.org
http://rennes.craigslist.org
http://grenoble.craigslist.org
http://lille.craigslist.org
http://loire.craigslist.org
http://lyon.craigslist.org
http://marseilles.craigslist.org
http://montpellier.craigslist.org
http://cotedazur.craigslist.org
http://rouen.craigslist.org
http://paris.craigslist.org
http://strasbourg.craigslist.org
http://toulouse.craigslist.org
http://berlin.craigslist.de
http://bremen.craigslist.de
http://cologne.craigslist.de
http://dresden.craigslist.de
http://dusseldorf.craigslist.de
http://essen.craigslist.de
http://frankfurt.craigslist.de
http://hamburg.craigslist.de
http://hannover.craigslist.de
http://heidelberg.craigslist.de
http://kaiserslautern.craigslist.de
http://leipzig.craigslist.de
http://munich.craigslist.de
http://nuremberg.craigslist.de
http://stuttgart.craigslist.de
http://athens.craigslist.gr
http://budapest.craigslist.org
http://reykjavik.craigslist.org
http://dublin.craigslist.org
http://bologna.craigslist.it
http://florence.craigslist.it
http://genoa.craigslist.it
http://milan.craigslist.it
http://naples.craigslist.it
http://perugia.craigslist.it
http://rome.craigslist.it
http://sardinia.craigslist.it
http://sicily.craigslist.it
http://torino.craigslist.it
http://venice.craigslist.it
http://luxembourg.craigslist.org
http://amsterdam.craigslist.org
http://oslo.craigslist.org
http://warsaw.craigslist.pl
http://faro.craigslist.pt
http://lisbon.craigslist.pt
http://porto.craigslist.pt
http://bucharest.craigslist.org
http://moscow.craigslist.org
http://stpetersburg.craigslist.org
http://alicante.craigslist.es
http://baleares.craigslist.es
http://barcelona.craigslist.es
http://bilbao.craigslist.es
http://cadiz.craigslist.es
http://canarias.craigslist.es
http://granada.craigslist.es
http://madrid.craigslist.es
http://malaga.craigslist.es
http://sevilla.craigslist.es
http://valencia.craigslist.es
http://stockholm.craigslist.se
http://basel.craigslist.ch
http://bern.craigslist.ch
http://geneva.craigslist.ch
http://lausanne.craigslist.ch
http://zurich.craigslist.ch
http://istanbul.craigslist.com.tr
http://ukraine.craigslist.org
http://aberdeen.craigslist.co.uk
http://bath.craigslist.co.uk
http://belfast.craigslist.co.uk
http://birmingham.craigslist.co.uk
http://brighton.craigslist.co.uk
http://bristol.craigslist.co.uk
http://cambridge.craigslist.co.uk
http://cardiff.craigslist.co.uk
http://coventry.craigslist.co.uk
http://derby.craigslist.co.uk
http://devon.craigslist.co.uk
http://dundee.craigslist.co.uk
http://norwich.craigslist.co.uk
http://eastmids.craigslist.co.uk
http://edinburgh.craigslist.co.uk
http://essex.craigslist.co.uk
http://glasgow.craigslist.co.uk
http://hampshire.craigslist.co.uk
http://kent.craigslist.co.uk
http://leeds.craigslist.co.uk
http://liverpool.craigslist.co.uk
http://london.craigslist.co.uk
http://manchester.craigslist.co.uk
http://newcastle.craigslist.co.uk
http://nottingham.craigslist.co.uk
http://oxford.craigslist.co.uk
http://sheffield.craigslist.co.uk
http://www.craigslist.org/about/sites
http://micronesia.craigslist.org
http://bangladesh.craigslist.org
http://beijing.craigslist.com.cn
http://chengdu.craigslist.com.cn
http://chongqing.craigslist.com.cn
http://dalian.craigslist.com.cn
http://guangzhou.craigslist.com.cn
http://hangzhou.craigslist.com.cn
http://nanjing.craigslist.com.cn
http://shanghai.craigslist.com.cn
http://shenyang.craigslist.com.cn
http://shenzhen.craigslist.com.cn
http://wuhan.craigslist.com.cn
http://xian.craigslist.com.cn
http://hongkong.craigslist.hk
http://ahmedabad.craigslist.co.in
http://bangalore.craigslist.co.in
http://bhubaneswar.craigslist.co.in
http://chandigarh.craigslist.co.in
http://chennai.craigslist.co.in
http://delhi.craigslist.co.in
http://goa.craigslist.co.in
http://hyderabad.craigslist.co.in
http://indore.craigslist.co.in
http://jaipur.craigslist.co.in
http://kerala.craigslist.co.in
http://kolkata.craigslist.co.in
http://lucknow.craigslist.co.in
http://mumbai.craigslist.co.in
http://pune.craigslist.co.in
http://surat.craigslist.co.in
http://jakarta.craigslist.org
http://tehran.craigslist.org
http://baghdad.craigslist.org
http://haifa.craigslist.org
http://jerusalem.craigslist.org
http://telaviv.craigslist.org
http://ramallah.craigslist.org
http://fukuoka.craigslist.jp
http://hiroshima.craigslist.jp
http://nagoya.craigslist.jp
http://okinawa.craigslist.jp
http://osaka.craigslist.jp
http://sapporo.craigslist.jp
http://sendai.craigslist.jp
http://tokyo.craigslist.jp
http://seoul.craigslist.co.kr
http://kuwait.craigslist.org
http://beirut.craigslist.org
http://malaysia.craigslist.org
http://pakistan.craigslist.org
http://bacolod.craigslist.com.ph
http://naga.craigslist.com.ph
http://cdo.craigslist.com.ph
http://cebu.craigslist.com.ph
http://davaocity.craigslist.com.ph
http://iloilo.craigslist.com.ph
http://manila.craigslist.com.ph
http://pampanga.craigslist.com.ph
http://zamboanga.craigslist.com.ph
http://singapore.craigslist.com.sg
http://taipei.craigslist.com.tw
http://bangkok.craigslist.co.th
http://dubai.craigslist.org
http://vietnam.craigslist.org
http://www.craigslist.org/about/sites
http://adelaide.craigslist.com.au
http://brisbane.craigslist.com.au
http://cairns.craigslist.com.au
http://canberra.craigslist.com.au
http://darwin.craigslist.com.au
http://goldcoast.craigslist.com.au
http://melbourne.craigslist.com.au
http://ntl.craigslist.com.au
http://perth.craigslist.com.au
http://sydney.craigslist.com.au
http://hobart.craigslist.com.au
http://wollongong.craigslist.com.au
http://auckland.craigslist.org
http://christchurch.craigslist.org
http://dunedin.craigslist.co.nz
http://wellington.craigslist.org
http://www.craigslist.org/about/sites
http://caribbean.craigslist.org
http://buenosaires.craigslist.org
http://lapaz.craigslist.org
http://belohorizonte.craigslist.org
http://brasilia.craigslist.org
http://curitiba.craigslist.org
http://fortaleza.craigslist.org
http://portoalegre.craigslist.org
http://recife.craigslist.org
http://rio.craigslist.org
http://salvador.craigslist.org
http://saopaulo.craigslist.org
http://santiago.craigslist.org
http://colombia.craigslist.org
http://costarica.craigslist.org
http://santodomingo.craigslist.org
http://quito.craigslist.org
http://elsalvador.craigslist.org
http://guatemala.craigslist.org
http://acapulco.craigslist.com.mx
http://bajasur.craigslist.com.mx
http://chihuahua.craigslist.com.mx
http://juarez.craigslist.com.mx
http://guadalajara.craigslist.com.mx
http://guanajuato.craigslist.com.mx
http://hermosillo.craigslist.com.mx
http://mazatlan.craigslist.com.mx
http://mexicocity.craigslist.com.mx
http://monterrey.craigslist.com.mx
http://oaxaca.craigslist.com.mx
http://puebla.craigslist.com.mx
http://pv.craigslist.com.mx
http://tijuana.craigslist.com.mx
http://veracruz.craigslist.com.mx
http://yucatan.craigslist.com.mx
http://managua.craigslist.org
http://panama.craigslist.org
http://lima.craigslist.org
http://puertorico.craigslist.org
http://montevideo.craigslist.org
http://caracas.craigslist.org
http://virgin.craigslist.org
http://www.craigslist.org/about/sites
http://cairo.craigslist.org
http://addisababa.craigslist.org
http://accra.craigslist.org
http://kenya.craigslist.org
http://casablanca.craigslist.org
http://capetown.craigslist.co.za
http://durban.craigslist.co.za
http://johannesburg.craigslist.co.za
http://pretoria.craigslist.co.za
http://tunis.craigslist.org
http://www.craigslist.org/about/
http://blog.craigslist.com/
http://www.craigslist.org/about/help/system-status.html
http://www.craigslist.org/about/help/
http://www.craigslist.org/about/terms.of.use
http://www.craigslist.org/about/privacy_policy
http://sfbay.craigslist.org/forums/?forumID=1
Related
I am trying to figure out some things about getting data from an external page using the PHP file_get_contents function.
This is the PHP code I am trying to get to work:
$url = 'http://www.controller.com/listings/aircraft/for-sale/list/category/3/jet-aircraft/manufacturer/cessna/model/citation-mustang';
$content = file_get_contents($url);
$first_step = explode('<div class="listing">',$content);
$second_step = explode("</div>",$first_step[1]);
echo $second_step[0];
It's a simple code to get the content of the divs with class 'listing' to echo on a page. For one reason or another, I keep getting the
notice Undefined offset error: 1
and can't figure out a way to fix this. When I turn off error reporting, it just returns an empty page. I already read it has something to do with empty arrays or something, but not sure how to fix this.
Thanks in advance!
You can get element by class name using DOMDocument :
$url = 'http://www.controller.com/listings/aircraft/for-sale/list/category/3/jet-aircraft/manufacturer/cessna/model/citation-mustang';
$content = file_get_contents($url);
$doc = new DOMDocument();
if (!$doc->loadHTML($content)) {
die ('error');
}
$a = new DOMXPath($doc);
$class = 'listing';
$divs = $a->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' $class ')]");
// $divs contains every divs with "listing" in his class
// you can get content like that :
foreach ($divs as $div) {
echo $div->nodeValue;
// or
echo $div->textContent;
}
More info with this question from stackoverflow : Get all elements by class name using DOMDocument
This question already has an answer here:
update/append data to xml file using php
(1 answer)
Closed 8 years ago.
I am currently using php 5.5.15
This is the code I use to write a simple xml file called comment.xml using DOM. Now the structure of the file as illustrated below is what I require. What I would appreciate is code sample which will allows me to read all users and comments and out put them say to html. and also code sample to append to the file below.
any help much appreciated.
/*** a new dom object ***/
$dom = new domDocument;
/*** make the output tidy ***/
$dom->formatOutput = true;
/*** create the root element ***/
$root = $dom->appendChild($dom->createElement( "comments" ));
/*** create the simple xml element ***/
$sxe = simplexml_import_dom( $dom );
/*** add a user element ***/
$sxe->addChild("user", $User_Name);
/*** add a comment element ***/
$sxe->addChild("comment", $Comment);
$dom->save('comment.xml');
The output for the above code is:
<?xml version="1.0"?>
<comments>
<user>Joe Blogs</user>
<comment>This is a comment</comment>
</comments>
Something like this would suffice
$xmlStr = '<container><comments><user>Joe Blogs</user><comment>This is a comment</comment></comments><comments><user>John Doe</user><comment>This is another comment</comment></comments></container>';
$dom = new DOMDocument;
$dom->loadXML($xmlStr);
if (!$dom) {
echo 'Error while parsing the document';
exit;
}
$xmlObj = simplexml_import_dom($dom);
foreach($xmlObj->comments as $child) {
echo 'user: '.$child->user.'<br>';
echo 'comment: '.$child->comment.'<br>';
echo '-----------------<br>';
}
I know there are similar question, but, trying to study PHP I met this error and I want understand why this occurs.
<?php
$url = 'http://aice.anie.it/quotazione-lme-rame/';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTML($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tbody/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
?>
this prints just "hello!". I want to print the value extracted with the xpath, but the last echo doesn't do anything.
You have some errors in your code :
You try to get the table from the url http://aice.anie.it/quotazione-lme-rame/, but it's actually in an iframe located at http://www.aiceweb.it/it/frame_rame.asp, so get the iframe url directly.
You use the function loadHTML(), which load an HTML string. What you need is the loadHTMLFile function, which takes the link of an HTML document as a parameter (See http://www.php.net/manual/fr/domdocument.loadhtmlfile.php)
You assume there is a tbody element on the page but there is no one. So remove that from your query filter.
Working code :
$url = 'http://www.aiceweb.it/it/frame_rame.asp';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTMLFile($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
I am trying to get the specific tag content, but seems I am not able to do so using following function
<?PHP
include_once('simple_html_dom.php');
function read_page($url = 'http://google.com')
{
$doc = new DOMDocument();
$data = file_get_html($url);
$content = $data->find('div#footer');
print_r( $content);
}
read_page();
?>
Try $data->find('div[id="footer"]')
I'm attempting to make a script that only echos the div that encolose the image on google.
$url = "http://www.google.com/";
$page = file($url);
foreach($page as $theArray) {
echo $theArray;
}
The problem is this echos the whole page.
I want to echo only the part between the <div id="lga"> and the next closest </div>
Note: I have tried using if's but it wasn't working so I deleted them
Thanks
Use the built-in DOM methods:
<?php
$page = file_get_contents("http://www.google.com");
$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($page);
libxml_use_internal_errors(false);
$domx = new DOMXPath($domd);
$lga = $domx->query("//*[#id='lga']")->item(0);
$domd2 = new DOMDocument();
$domd2->appendChild($domd2->importNode($lga, true));
echo $domd2->saveHTML();
In order to do this you need to parse the DOM and then get the ID you are looking for. Check out a parsing library like this http://simplehtmldom.sourceforge.net/manual.htm
After feeding your html document into the parser you could call something like:
$html = str_get_html($page);
$element = $html->find('div[id=lga]');
echo $element->plaintext;
That, I think, would be your quickest and easiest solution.