What is XML?
XML refers to Extensible Markup Language (eXtensible Markup Language).
XML is designed to transmit and store data.
XML is a set of rules that define semantic tags that divide a document into parts and identify these parts.
It is also a meta-markup language, that is, it defines a syntactic language for defining other semantic and structured markup languages ??related to specific fields.
pythonParsing of XML
Common XMLProgrammingInterfaceThere are DOM and SAX, these two The two interfaces process XML files in different ways, and of course the usage scenarios are also different.
Python has three methods to parse XML, SAX, DOM, and ElementTree:
1.SAX (simple API for XML )
The python standard library contains a SAX parser. SAX uses events to drive the model by triggering events one by one and calling the user during the process of parsing XML. Defined Callback function to process XML files.
2.DOM(Document Object Model)
Parse the XML data into a tree in memory, through Manipulate XML using tree operations.
3.ElementTree (Element Tree)
ElementTree is like a lightweight DOM with a convenient and friendly API. The code has good usability, is fast, and consumes less memory.
Note: Because DOM needs to map XML data to a tree in memory, firstly, it is relatively slow, and secondly, it consumes more memory, while SAX streaming reading of XML files is faster. It takes up less memory, but requires the user to implement a callback function (handler).
The content of the XML example file movies.xml used in this chapter is as follows:
<collection shelf="New Arrivals"><movie title="Enemy Behind"> <type>War, Thriller</type> <format>DVD</format> <year>2003</year> <rating>PG</rating> <stars>10</stars> <description>Talk about a US-Japan war</description></movie><movie title="Transformers"> <type>Anime, Science Fiction</type> <format>DVD</format> <year>1989</year> <rating>R</rating> <stars>8</stars> <description>A schientific fiction</description></movie> <movie title="Trigun"> <type>Anime, Action</type> <format>DVD</format> <episodes>4</episodes> <rating>PG</rating> <stars>10</stars> <description>Vash the Stampede!</description></movie><movie title="Ishtar"> <type>Comedy</type> <format>VHS</format> <rating>PG</rating> <stars>2</stars> <description>Viewable boredom</description></movie></collection>
python uses SAX to parse xml
SAX is an event-driven API .
Using SAX to parse XML documents involves two parts: the parser and the event handler.
The parser is responsible for reading the XML document and sending events to the event processor, such as element start and element end events;
The event processor is responsible for responding to the event and passing the XML data is processed.
1. Process large files;
2. Only need part of the file, or only need specific information from the file .
3. When you want to create your own object model.
To use sax to process xml in python, you must first introduce the parse function in xml.sax and the ContentHandler in xml.sax.handler.
ContentHandler class method introduction
characters(content) method
Calling timing:
Start from the line, before encountering the label , there are characters, and the value of content is these strings .
From one label, there are characters before encountering the next label, and the value of content is these strings.
From a label, there are characters before encountering the line terminator, and the value of content is these strings. The
tag can be a start tag or an end tag.
startDocument() method
Called when the document is started.
endDocument() method
Called when the parser reaches the end of the document.
startElement(name, attrs) method
Called when an XML start tag is encountered, name is the name of the tag, attrs is the attribute of the tag Dictionary of values.
endElement(name) method
Called when an XML end tag is encountered.
make_parser method
The following method creates a new parser object and returns it.
xml.sax.make_parser( [parser_list] )
Parameter description:
parser_list - Optional parameter, parser list
parser method
The following method creates a SAX parser and parses the xml document:
xml.sax.parse( xmlfile, contenthandler[, errorhandler])
Parameter description:
xmlfile - xml file name
contenthandler - must be a ContentHandler object
errorhandler - 如果指定該參數(shù),errorhandler必須是一個SAX ErrorHandler對象
parseString方法
parseString方法創(chuàng)建一個XML解析器并解析xml字符串:
xml.sax.parseString(xmlstring, contenthandler[, errorhandler])
參數(shù)說明:
xmlstring - xml字符串
contenthandler - 必須是一個ContentHandler的對象
errorhandler - 如果指定該參數(shù),errorhandler必須是一個SAX ErrorHandler對象
Python 解析XML實例
#!/usr/bin/python# -*- coding: UTF-8 -*-import xml.saxclass MovieHandler( xml.sax.ContentHandler ): def init(self): self.CurrentData = "" self.type = "" self.format = "" self.year = "" self.rating = "" self.stars = "" self.description = "" # 元素開始事件處理 def startElement(self, tag, attributes): self.CurrentData = tag if tag == "movie": print "*****Movie*****" title = attributes["title"] print "Title:", title # 元素結(jié)束事件處理 def endElement(self, tag): if self.CurrentData == "type": print "Type:", self.type elif self.CurrentData == "format": print "Format:", self.format elif self.CurrentData == "year": print "Year:", self.year elif self.CurrentData == "rating": print "Rating:", self.rating elif self.CurrentData == "stars": print "Stars:", self.stars elif self.CurrentData == "description": print "Description:", self.description self.CurrentData = "" # 內(nèi)容事件處理 def characters(self, content): if self.CurrentData == "type": self.type = content elif self.CurrentData == "format": self.format = content elif self.CurrentData == "year": self.year = content elif self.CurrentData == "rating": self.rating = content elif self.CurrentData == "stars": self.stars = content elif self.CurrentData == "description": self.description = content if ( name == "main"): # 創(chuàng)建一個 XMLReader parser = xml.sax.make_parser() # turn off namepsaces parser.setFeature(xml.sax.handler.feature_namespaces, 0) # 重寫 ContextHandler Handler = MovieHandler() parser.setContentHandler( Handler ) parser.parse("movies.xml")
以上代碼執(zhí)行結(jié)果如下:
*****Movie*****Title: Enemy BehindType: War, ThrillerFormat: DVDYear: 2003Rating: PGStars: 10Description: Talk about a US-Japan war*****Movie*****Title: TransformersType: Anime, Science FictionFormat: DVDYear: 1989Rating: RStars: 8Description: A schientific fiction*****Movie*****Title: TrigunType: Anime, ActionFormat: DVDRating: PGStars: 10Description: Vash the Stampede!*****Movie*****Title: IshtarType: ComedyFormat: VHSRating: PGStars: 2Description: Viewable boredom
使用xml.dom解析xml
文件對象模型(Document Object Model,簡稱DOM),是W3C組織推薦的處理可擴(kuò)展置標(biāo)語言的標(biāo)準(zhǔn)編程接口。
一個 DOM 的解析器在解析一個 XML 文檔時,一次性讀取整個文檔,把文檔中所有元素保存在內(nèi)存中的一個樹結(jié)構(gòu)里,之后你可以利用DOM 提供的不同的函數(shù)來讀取或修改文檔的內(nèi)容和結(jié)構(gòu),也可以把修改過的內(nèi)容寫入xml文件。
python中用xml.dom.minidom來解析xml文件,實例如下:
#!/usr/bin/python# -*- coding: UTF-8 -*-from xml.dom.minidom import parseimport xml.dom.minidom# 使用minidom解析器打開 XML 文檔DOMTree = xml.dom.minidom.parse("movies.xml")collection = DOMTree.documentElementif collection.hasAttribute("shelf"): print "Root element : %s" % collection.getAttribute("shelf")# 在集合中獲取所有電影movies = collection.getElementsByTagName("movie")# 打印每部電影的詳細(xì)信息for movie in movies: print "*****Movie*****" if movie.hasAttribute("title"): print "Title: %s" % movie.getAttribute("title") type = movie.getElementsByTagName('type')[0] print "Type: %s" % type.childNodes[0].data format = movie.getElementsByTagName('format')[0] print "Format: %s" % format.childNodes[0].data rating = movie.getElementsByTagName('rating')[0] print "Rating: %s" % rating.childNodes[0].data description = movie.getElementsByTagName('description')[0] print "Description: %s" % description.childNodes[0].data
以上程序執(zhí)行結(jié)果如下:
Root element : New Arrivals*****Movie*****Title: Enemy BehindType: War, ThrillerFormat: DVDRating: PGDescription: Talk about a US-Japan war*****Movie*****Title: TransformersType: Anime, Science FictionFormat: DVDRating: RDescription: A schientific fiction*****Movie*****Title: TrigunType: Anime, ActionFormat: DVDRating: PGDescription: Vash the Stampede!*****Movie*****Title: IshtarType: ComedyFormat: VHSRating: PGDescription: Viewable boredom
The above is the detailed content of Detailed explanation of XML parsing in Python. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)