Nodename selects all children of this node.
/Select from the root node.
//Select the nodes in the document that match the selection from the current nodes, regardless of their positions.
. Select the current node.
.. select the parent node of the current node.
@ Select an attribute.
Routing expression
result
Bookstore selects all the child nodes of the Bookstore element.
/bookstore Select the root element bookstore. Note: If the path starts with a forward slash (/), it always represents the absolute path of an element!
Bookstore/Book Select all the book elements that belong to the sub-elements of the bookstore.
//book selects all book child elements, regardless of their position in the document.
Book store//books selects all the child elements of the book element, no matter where they are located in the bookstore.
Named lang selects all attributes named lang.
for instance
1. Find page root element://
2. Find all input elements on the page: //input
3. Find the direct child input element in the first form element on the page (that is, the next-level input element containing only the form element, which is represented by an absolute path and represented by a single symbol): //form[ 1]/input.
4. Find all the child input elements in the first form element on the page (as long as the input in the form element is calculated, no matter how many other tabs are nested, it is represented by a relative path with a double//sign)://form [1]/input.
5. Find the first form element on the page: //form[ 1]
6. Find the form element with the id of loginForm on the page: //form[@id='loginForm'].
7. Find the input element whose name attribute is username on the page: //input[@name='username'].
8. On the page, find the first input element under the form element with the id of loginForm://form [@ id =' login form']/input [1].
9. The search page has an input element whose name attribute is continuous and whose type attribute is button://input [@ name =' continue'] [@ type =' button'].
10. Find all the elements with id in the web page? /@id
2. Decorate the content searched by the node.
for instance
Routing expression
result
/bookstore/book[ 1] Select the first book element that belongs to the child element of the bookstore.
/bookstore/book[last()] Select the last book element that belongs to the bookstore child element.
/book store/book [last ()-1] Select the penultimate Book element that belongs to the child element of Bookstore.
/bookstore/book[position()] Select the first two book elements that belong to the child elements of the bookstore element.
//title[@lang] Select all the title elements whose attribute name is lang.
//title[@lang='eng'] Select all the title elements, which have the lang attribute with the value of eng.
/Bookstore/Book [Price & gt35.00] Select all book elements of the bookstore element, and the value of the price element must be greater than 35.00.
/bookstore/book [price & gt35.00]/title Select all title elements of the Book element in the Book element, and the value of the Price element must be greater than 35.00.
3. Select an unknown node
wildcard character
describe
* Matches any element node.
@ * Matches any attribute node.
Node () matches any type of node.
for instance
Routing expression
result
/bookstore/* Select all the child elements of the bookstore element.
//* Select all elements in the document.
//title[@*] Select all title elements with attributes.
4. Choose several paths
You can select multiple paths by using the | operator in the path expression.
Routing expression
result
//book/title | //book/price Select all the titles and price elements of the book element.
//title | //price Select all the title and price elements in the document.
/bookstore/book/title | //price Select all title elements of the book element belonging to the bookstore element and all price elements in the document.
5. Keywords
Use case
for instance
Text () Book/Author/Text ()
String () Book/Author /string ()
Data () Books/Authors/Data ()
. A book/author/.
for instance
XML example
& ltbook & gt& lt author & gt Tom & lt/em & gt; John & lt/em > Cat</ Author & gt& lt Pricing & gt& lt Price & gt20 & lt/price & gt;; & lt discount & gt0.8 & lt/discount & gt;; & lt/pricing & gt; & lt/book & gt;
Text ()
You often see text () at the end of XPath expressions, which only returns the text content of the specified element.
The crawled xpath format is book/author/text (), and the crawled content is Tom cat, where John does not belong to the author's direct node content.
String ()
The string () function will get all the node text contents of the specified element, which will be spliced into a string.
The crawled xpath format is book/author/string (), and the crawled content is crawled from the head to the tail of Tom John Cat.
Data ()
Most of the time, the data () function and the string () function are commonly used, and frequent use of the data () function is not recommended. According to statistics, this function will affect the performance of XPath.
The crawled xpath format is book/pricing/data (), and the crawled content returns separated 20 and 0.8. Their types are not strings, but xs:anyAtomicType, so you can use mathematical functions to perform some operations.
You can only use data () when crawling all numbers, but you can't use text () or string () because XPath doesn't support strings for mathematical operations.
Author: little salted fish YYY
Source: blogs.com/pythonywy/p/11082153.html.
About the author: No matter how long the road is, it will come out step by step. No matter how short the road is, it is impossible to walk without taking a step.
Signature of this work-non-commercial use-no interpretation of international version 4.0? Permission, please indicate the author and source.
Classification:? Reptiles are good at writing? Pay attention to my collection of this article, little salted fish YYY.
Focus -4
Fans -302+ plus attention 00 Previous:? Descriptor \ Get/Set/Delete, Initialize/Create/Call, Metaclass
Next:? Network framework, internet composition, OSI seven-layer protocol, abstract layer paste @ Little salted fish YYY? Reading (1584)? Note (3) Edit Collection
Comment list # 1 Lou 20 19-06-25 13:26? Thank you for your support (0)? Objection (0)#2 Floor 20 19-06-25 13:36? Amazing 2 Thanks for your support (0)? Objection (0)# 3 [Landlord]? 20 19-06-25 14:07? Little salted fish YwY@ Jing Jing er Zuo
You're welcome to support (0)? Objection (0) Refresh the comment refresh page, and return to the top registered user to log in before commenting. Please. Login? Or? Register? Visit? Website home page. I suggest getting to know you better. Blog Park launched a questionnaire survey to help the community upgrade.
Recommend more than 500,000 lines of VC++ source code: large-scale configuration industrial control, power simulation CAD, GIS source code library.
Recommended open download! Basic practical manual for OSS operation and maintenance
personal information
The process of building programs is essentially the process of debugging specifications-click to view the photos of bloggers' lives. 568972484
Wechat:? YwYbetheone
Personal blog:? Mr. Yang's blog
Personal music website:? Aegean music
Radio:? Proficient in python reptiles for two minutes every day. Xiaoxiao salted fish YwY
Garden age:? 1 year and 2 months
Fans:? 302
Concern:? 4+ attention
& lt July 2020 >
sun
one
two
three
four
five
six
28 29 30 1 2 3 4
5 6 7 8 9 10 1 1
12 13 14 15 16 17 18
19 20 2 1 22 23 24 25
26 27 28 29 30 3 1 1
2 3 4 5 6 7 8
My label
Drf framework (15)
Vue-CLI( 13)
E-commerce related crawling (6)
Forum (6)
Hook frame Frida (5)
Tornado (4)
Appendix (3)
Java(3)
Git detailed operation (3)
Timing tasks and asynchronous tasks (3)
more
Points and rankings
Integer-1908 14
Rank -29 15
Composition classification? (572)
Jiang Ge (6 1)
Flasks (16)
github(9)
Go (17)
Jupyter notebook (1)
linux(20)
Python learning diary (1 16)
Shell (1)
Taipola (2)
vs( 1)
Vs self-study diary (7)
Vue(26)
Concurrent Programming (8)
Blog Garden (10)
Personal Blog Construction (6)
Mandatory button question bank (22)
Reptiles (127)
The front end (50)
Database (22)
Wechat applet (1 1)
Applet (22)
Exception (17)
Paper files? (494)
July 2020 (8)
June 2020 (14)
May 2020 (4)
April 2020 (9)
March 2020 (10)
February 2020 (5)
65438+October 2020 (10)
20 19 12 ( 13)
20 19 165438+ October (49)
20 19 10 (78)
2065438+September 2009 (76)
2065438+August 2009 (74)
2065438+July 2009 (48)
2065438+June 2009 (4 1)
2065438+May 2009 (48)
April 20 19 (7)
Latest comments
1. Reply: Blog Garden Beautifies Little Rocket
thank you
Peter William
2. Reply: The front end realizes all the ways of downloading files.
Cool. . . . . . . . . . . . . . . . . . . . . . . . . . . .
-Xiaobaotao
3. About: jwt authentication and custom jwt authentication in DRF framework.
Hi, A Liang has seen the video. ...
-Little salted fish YYY
4. About: jwt authentication and custom jwt authentication in DRF framework.
Reading my brother's blog should also be old boys's brother. This article is really detailed.
Hi, A Liang.
5. Reply: Reptiles
Xiao Zailong ...
-Little salted fish YYY
6. reply: reptile finishing
Boss, I've been studying the cracking method of limit verification code for some time, but the slider always deviates, and it's solved.
-Little Jae-Ryong
7. Reply: python diary arrangement
@ 17 Index Thank you ...
-Little salted fish YYY
8. Reply: python diary arrangement
strong
-Seventeen index
9.Re:GO Language Introduction and Development Environment Configuration
I have studied you and paid attention to you.
-Seventeen index
10.re: parsel module for Python crawler web page parsing.
My name is Liu Xiaohua. What's the password? ...
-Little salted fish YYY
Reading leaderboard
1.python crawler (grabbing pictures) (16036)
2.python Crawler (Capture Video) (13072)
3.python- Crawler Learning Directory (4 164)
4.django generates model classes according to existing database tables (3446)
5. Monty Python Diary Arrangement (3222)
6. parsel module for 6.Python crawler web page parsing (3084)
7. Scroll association in 7.JS (2906)
8. Panda module (detailed classification), pd.concat (subsequent supplement) (2884)
9. Response Attribute and Content Extraction in 9.Scrapy (2799)
10.Python3 installs a small pit using the urllib2 package (1933).
Copyright? 2020 xiaoxiao salted fish YYY
Power comes from. NET Core on Kubernetes