In the Previous Tutorial, we learned to locate elements on the page by using the following attributes –
We discussed some challenges in that tutorial where we are not able to uniquely identify an element by using its id, class or name attribute. Here comes XPath to our rescue.
In this tutorial, we’ll learn about the XPath.
What is an XPath?
Every webpage is a document that consists of different HTML tags like
<!DOCTYPE html> <html> <body> <h1>My First Heading</h1> <p>My first paragraph.</p> </body> </html>
As we know XML documents also consist of elements and attributes.
<note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note>
XPath is used to navigate through elements and attributes in an XML document. By using XPath we can query the page document as if it were an XML document. To locate a particular element we can write an XPath query that could use the element’s tag name as well as its attributes(s). The query would return the matching element in XML. Every modern browser has a built-in XPath engine.
How to write an Xpath query?
Let us learn to write XPath query through an example –
- Open Chrome browser and navigate to www.google.com
- Inspect the Google Search Box using Chrome’s
Inspect Element. If you missed it how to do that you can check This Tutorial.
- Let us now look closely at the source code for the Google Search input box. Inspect it again.
<table id="gs_id0" class="gstl_0 lst-t" cellspacing="0" cellpadding="0" style="height: 27px; padding: 0px;"> <tbody> <tr> <td id="gs_ttc0" style="white-space: nowrap;" dir="ltr"></td> <td id="gs_tti0" class="gsib_a"> <div id="gs_lc0" style="position: relative;"> <input id="gbqfq" class="gbqfif" type="text" value="" autocomplete="off" name="q" style="border: medium none; padding: 0px; margin: 0px; height: auto…; width: 100%; background: url('" dir="ltr" spellcheck="false"> </input> </div> </td> <td class="gsib_b"></td> </tr> </tbody> </table>
- We can construct an XPath query to locate this element –
How to verify if the XPath query is correct?
We can use the search functionality of the Chrome developer tools. Right-click anywhere on the page and select ‘Inspect’. It should open the developer’s tool. Click the ‘Elements’ tab in the inspect window and use Ctrl+F to open the search window. Search for the locator (XPath, CSS etc) and verify if it appears in the search result.
Understanding the XPath query
It is saying to find an
input tag ANYWHERE (
// indicates anywhere) in the document that has
name property and its value is
Difference between an XPath starting from ‘/’ and one starting from ‘//’?
A single slash at the start of Xpath instructs the XPath engine to look for elements starting from the root node. If we had written
/html/body, it would have searched from the start of XML. A double slash at the start of Xpath instructs the XPath engine to search look for matching elements ANYWHERE in the XML document.
What does single-slash ‘/’ mean if used inside the XPath?
A single slash
/ anywhere in XPath signifies to look for element immediately inside its parent element. For example for our Search box source code, we can construct XPath like this as well –
It is saying like this. Hey XPath Engine. Find an element with
table tag ANYWHERE(
//) in the document. Make sure that element has an immediate child element named as
tbody the element should have an immediate child as
tr can have many immediate children. I am interested in its SECOND () child element. This
td element should have an immediate
div child. And the next child in ancestor legacy should be
input element. And this ‘input’ element should have
id property whose value should be
What does double-slash ‘//’ mean if used inside the XPath?
A double slash
// signifies to look for any child or grand-child or grand grand-child or grand grand-child element inside the parent element. So for the same classic Google Search box, we can construct XPath like this as well –
It is saying – Hey get me
table tag element ANYWHERE in the document. Make sure that inside that
table tag element there should be ‘input’ tag element. I don’t care if the input tag element is ‘table’ tag’s child or grandchild or grandchild. I just care ‘input’ tag should be ENCLOSED by ‘table’. And yes, don’t forget, the ‘input’ tag should have ‘class’ property and its value should be
Oh, man. You are driving me nuts.
I would suggest to first carefully examine the source code for Google Search Box and read the above paragraphs again and again. Then try to relate each sentence with the referred part of the source code.
Please don’t rush until you grasp XPath basics properly.
In the Next Tutorial, we would dive deeper into the XPath ocean.