The Real Story Of Generic Engine For Fetching Pages
Discussion category searxng, searxng
Most search engines rely on fast, direct HTTP calls to deliver results - but the real web demands JavaScript rendering for rich, dynamic content. That’s where traditional xpath and json_engine engines fall short when providers block or require JS-heavy responses. The current workflow forces developers to build custom engine code per service, wasting time and creating technical debt.
A unified xpath_proxy engine would change this by wrapping the familiar xpath pattern with a flexible external rendering layer. It preserves SearXNG’s template precision - handling query, paging, language, time, and safe search - while delegating page fetching to a configurable HTTP endpoint. Instead of building a new engine for each provider, admins could simply update a shared YAML config with a proxy URL, search URL, and XPath expressions - just like the existing xpath engine.
The core logic:
- Build the full search URL with SearXNG’s template system
- Submit the URL to an external rendering service via POST
- Parse results using the same XPath logic as xpath.py - raw HTML or JSON - depending on what the service returns
This approach avoids redundant boilerplate, reduces maintenance, and keeps the project vendor-agnostic. It fits neatly into today’s challenges with JavaScript-heavy platforms and rising anti-scraping barriers.
The proposal draws from real use cases: Material-Scientist showed how Playwright-powered custom engines work for sites like Imgur and.google, but these require hardcoded proxy calls that don’t scale. This engine abstracts that complexity into a clean, reusable pattern.
Verified in discussion #5651, the need for JS rendering is no longer theoretical - many major providers now block or obfuscate content without browser context. The sidecar concept return42 highlighted, where a rendering service acts as a trusted intermediary, complements this design perfectly.
But here’s the catch: the engine must work out of the box with SearXNG’s current tooling, without forcing new dependencies or architecture shifts. It needs to be lightweight, configurable, and safe - no extra latency, no security risks.
For those asking: yes, a working prototype exists, and I’m open to a PR if this solves a real pain point. At its core, this isn’t just about fetching pages - it’s about future-proofing open search against the growing wall of JavaScript walls.
In a world where browsers simulate users and APIs block raw HTML, this generic engine could be the bridge SearXNG needs to keep delivering meaningful, accessible results - without reinventing the wheel.