Auszug Inhalt <Script mit BeautifulSoup
1/versuche ich zu extrahieren ein Teil des Skripts mit schöne Suppe, aber es druckt Nichts. Was ist falsch ?
URL = "http://www.reuters.com/video/2014/08/30/woman-who-drank-restaurants-tainted-tea?videoId=341712453"
oururl= urllib2.urlopen(URL).read()
soup = BeautifulSoup(oururl)
for script in soup("script"):
script.extract()
list_of_scripts = soup.findAll("script")
print list_of_scripts
2/Das Ziel ist, extrahieren Sie den Wert des Attributs "transcript":
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "VideoObject",
"video": {
"@type": "VideoObject",
"headline": "Woman who drank restaurant's tainted tea hopes for industry...",
"caption": "Woman who drank restaurant's tainted tea hopes for industry...",
"transcript": "Jan Harding is speaking out for the first time about the ordeal that changed her life. SOUNDBITE: JAN HARDING, DRANK TAINTED TEA, SAYING: \"Immediately my whole mouth was on fire.\" The Utah woman was critically burned in her mouth and esophagus after taking a sip of sweet tea laced with a toxic cleaning solution at Dickey's BBQ. SOUNDBITE: JAN HARDING, DRANK TAINTED TEA, SAYING: \"It was like a fire beyond anything you can imagine. I mean, it was not like drinking hot coffee.\" Authorities say an employee mistakenly mixed the industrial cleaning solution containing lye into the tea thinking it was sugar. The Hardings hope the incident will bring changes in the restaurant industry to avoid such dangerous mixups. SOUNDBITE: JIM HARDING, HUSBAND, SAYING: \"Bottom line, so no one ever has to go through this again.\" The district attorney's office is expected to decide in the coming week whether criminal charges will be filed.",
Du musst angemeldet sein, um einen Kommentar abzugeben.
extract
entfernen-Tags aus dem dom. Das ist, warum man leere Liste.Finden
script
mit dertype="application/ld+json"
Attribut-und entschlüsseln mitjson.loads
. Dann können Sie auf die Daten zugreifen, wie Python-Datenstruktur. (dict
für die gegebenen Daten)print ''.join(soup.find('span', id='articleText').strings)
soup.find('span', id='articleText').strings
gibtNone
als Ergebnis#!/usr/bin/python
import urllib2
from BeautifulSoup import BeautifulSoup
URL = "http://www.reuters.com/article/2014/08/26/ichitan-group-indonesia-green-tea-idUSL3N0QW27F20140826"
Content = urllib2.urlopen(URL).read()
soup = BeautifulSoup(Content)
soup.find('span',id='articleText').strings
print ''.join(soup.find('span',id='articleText').strings)
for i,tag in enumerate(soup.findAll('script')): print(i,tag.text)
Wenn Sie auf iPython, können Sie versuchen??Tag
oder??Find
zu zeigen, die Implementierung, die hinter es, um zu finden, andere Attribute