Cheerio and `.find()` return too many elements compared to jQuery

Basically I am trying to parse an HTML string and extract some information using Cheerio.js.

My HTML is a follow (of course I reduced and simplified it):

  • Can't get a value from function using jquery
  • Web scraping from a live score site
  • Executing scraped JavaScript with cheerio
  • How to use CHEERIO.js for this HTML?
  • how to result the contents of a javascript variable using cheerio (jquery like selectors, but no dom)
  • Iterate over source page with cheerio and performing logic
  • <html>
                            <a href="/link_1.php">Link 1</a>
                            <a href="/link_2.php">Link 2</a>
                            <a href="/link_3.php">Link 3</a>
                            <a href="/link_4.php">Link 4</a>
                            <a href="/link_5.php">Link 5</a>

    My code is this one:

    var cheerio = require("cheerio");
    var $ = cheerio.load(html);
    var page = $.root();
    var tr = page.find("tr");
    console.log(tr.find("> :nth-child(2) a").length);

    You can try it here.

    What I would expect is the code to return 2 because there is two links in the second direct child of the tr element. However, this returns 5, all the links which are in the tr are returned.

    I tried the same thing with jQuery and the result is as it should be, see.

    I also noticed that removing <html> tag makes it work correctly, but I do not know why.

    Am I doing something wrong or should I report this to developers as a bug?

    Edit: I just opened an issue on GitHub.

  • Scraping with Node.js and Cheerio
  • Node.js and Cheerio parsing table with selectors
  • Cheerio: SyntaxError: Malformed attribute selector: object global?
  • Cheerio: How to select element by text content?
  • BeautifulSoup like scraper for nodejs
  • Get title of a page with cheerio
  • One Solution collect form web for “Cheerio and `.find()` return too many elements compared to jQuery”

    That fixes your issue, it helps if you find the items by children opposed to just a general find() statement!

    var $ = cheerio.load(html);
    var page = $.root();
    var tr = page.find("tr");
    console.log(tr.find("> :nth-child(2)").find('a').length)