The Real Story Of Generic Engine For Fetching Pages

by Jule 52 views
The Real Story Of Generic Engine For Fetching Pages

Discussion category searxng, searxng

Most search engines rely on fast, direct HTTP calls to deliver results - but the real web demands JavaScript rendering for rich, dynamic content. That’s where traditional xpath and json_engine engines fall short when providers block or require JS-heavy responses. The current workflow forces developers to build custom engine code per service, wasting time and creating technical debt.

A unified xpath_proxy engine would change this by wrapping the familiar xpath pattern with a flexible external rendering layer. It preserves SearXNG’s template precision - handling query, paging, language, time, and safe search - while delegating page fetching to a configurable HTTP endpoint. Instead of building a new engine for each provider, admins could simply update a shared YAML config with a proxy URL, search URL, and XPath expressions - just like the existing xpath engine.

The core logic:

  • Build the full search URL with SearXNG’s template system
  • Submit the URL to an external rendering service via POST
  • Parse results using the same XPath logic as xpath.py - raw HTML or JSON - depending on what the service returns

This approach avoids redundant boilerplate, reduces maintenance, and keeps the project vendor-agnostic. It fits neatly into today’s challenges with JavaScript-heavy platforms and rising anti-scraping barriers.

The proposal draws from real use cases: Material-Scientist showed how Playwright-powered custom engines work for sites like Imgur and.google, but these require hardcoded proxy calls that don’t scale. This engine abstracts that complexity into a clean, reusable pattern.

Verified in discussion #5651, the need for JS rendering is no longer theoretical - many major providers now block or obfuscate content without browser context. The sidecar concept return42 highlighted, where a rendering service acts as a trusted intermediary, complements this design perfectly.

But here’s the catch: the engine must work out of the box with SearXNG’s current tooling, without forcing new dependencies or architecture shifts. It needs to be lightweight, configurable, and safe - no extra latency, no security risks.

For those asking: yes, a working prototype exists, and I’m open to a PR if this solves a real pain point. At its core, this isn’t just about fetching pages - it’s about future-proofing open search against the growing wall of JavaScript walls.

In a world where browsers simulate users and APIs block raw HTML, this generic engine could be the bridge SearXNG needs to keep delivering meaningful, accessible results - without reinventing the wheel.