How to Use XPATH and CSS to Scrap Data for Free

This article is written to help you quickly understand how to parse data using . The article is designe for people who are not familiar with XPath and CSS. We will consider a little theory and basic (for data parsing) syntax that will help you understand how to collect data from the vast majority of sites.

Using Xpath for Parsing

First of all, you need to understand what Xpath (XML Path Language) is – it is a language for querying elements of xml markup. This means library shop that by sending a request composed in a certain way, you receive the necessary data in response. A simple analogy is an address in the browser line or a path in the explorer to the desired folder, by typing the correct path you get to the desired site or the desired folder. With Xpath it is the same – we write the path and get to the necessary data, only unlike the browser line, we use Xpath for searching.

 

 

For example, the code might look like this:

 

library shop

If you right-click on an empty space on the site and select “site source code” or “view page code” in the context menu (it varies in different browsers), you will end up on the page with the code from which the parser extracts data.


As you can see, the code is a tree Materialni taqdim etishning mantiqiy tomoni structure in which each element is marke in a certain way; our task is to indicate to the parser the path to the element we nee.

We will consider further actions using the example of our database catalog at address.

For further work we will need the developer tool built into the browser, in Chrome – context menu – view code, in Firefox – context menu – explore.

 

So, let’s find the path to the product card name:

Right-click on the product name – a context usa bu menu will open, select – “view code” – found the desired element in the code. How can you determine the path to it? As in the case of the explorer, go down from the top category to the “desired folder”. The top directory is “html”, then “body”, then several blocks “div”, “ul”, if at some level there are several blocks with the same name, then in square brackets we write which element in order we need:

Scroll to Top