Signal drop!
Relay (operand.online) is unreachable.
Usually, a dropped signal means an upgrade is happening. Hold on!
Sorry, no connección.
Hang in there while we get back on track
Splice - Potluck, Cambria, and Wildcard
Mainly in response to Geoffrey's research; read more on: Potluck, Cambria, and Wildcard.
Hello, Peter and Geoffrey;
Especially Geoffrey; I spent the day poring through many of your PhD publications; these seemed to clarify many ideas of user experience I'd been aiming for subconsciously. I'd like to describe a serious problem space in the middle of your three research programs: Wildcard, Potluck, and Cambria.
Of these examples, the most inspiring experience to me is wildcard - I am an enthused and eager web scraper, and migrating from Ruby / Capybara to Elixir / Wallaby has hardly been enough for me - I'd like equal access to web pages that require sign-in, or pages that make use of doom-scrolling; accomplishing these aims requires the JavaScript execution layer. Earlier, I learned some of the Gleam language by compiling a simple plugin, and I am eager to build upon this base again.
I did mess around, and had no luck spinning up a copy of Wildcard. Typescript has changed a bunch since 2021 (~1500 errors), so I am simply reading many of the modules and learning how much I can apply from the early approach. The core pieces all changed a bunch since embedded SQLite, handsontable, and bootstrap, so a ground-up rebuild seems simpler.
Sidebar: Launch Wildcard.
Spoke much too soon! I came back around to examine the error, now using web-ext to speed up the process. I also added a minimal flake.nix
, so I could ensure all dependencies in a reproducible local shell.
In my normal commands, I rely on my nushell helper nd
, so web-ext run
becomes nd web-ext run
. On a normal machine, such a command reads as: nix develop --command 'web-ext run'
. These prefixes are simply assurance that all the dependencies are in place, so I'll assume you can manage dependencies on your machine and ignore the prefixes here.
yarn install
node build.js # many errors! no problem, ignore all.
web-ext run -u news.ycombinator.com
I had been really surprised that web-ext managed to open a firefox panel at all - in my opinion, the compilation had failed under the burden of endless typescript semicolon concerns. Only, there was an installed plugin in the window, and I could see in the console an error message referencing the HackerNews site adapter.
I poked around and realized that the HN markup has changed since 2021 - the selector a.storylink
is now .title a
. I made the change in my copy, and upon node ./build.js
the running web-ext
process picked up on the change.
I needed a second to realize the page already had the wildcard pane open, happily rendering all of the HN records as promised. Research needs to be reproducible, no? I'm incredibly pleased to accomplish this at the end of a 36-hour in-office research marathon.
I managed to fill a dry-erase board, merging ideas from the three programs into a single experience. From a blank browser pop-up panel, I imagined adding pglite; simple enough, I'm sure. At this phase I realized the core logic of wildcard is the bidirectional mapping, from the html page to the database table. I read a couple of the site adapters, and can see that each one maps a recognized URL into a pre-coded arrangement of columns in the table. Nice and simple, and I'd like to praise Geoffrey's efficiency in cracking these specific use cases.
Again, much progress has happened since 2021. I considered how to splice Cambria and Wildcard. In addition to mapping one JSON body to another, Cambria also handles incremental changes using the JSON patch spec. So ideal! Browser execution means open websockets, so a scraper can stream rather than poll. This is the only (sensible) approach on pages that rely on doom-scrolling mechanics. Patch streams may also help someone roll up group discussions into periodical summaries - you could keep Slack open in a long-running background tab, on a headless machine, and sync the pglite table to the eerily absent command-line program all of us need.
Yes, I realize - Cambria is JSON! Wildcard is HTML! Potluck is language! There is a challenge here, to expand Cambria to ingress / egress HTML and beyond, alongside the normal JSON use case. There is surely promise here - Cambria, Wildcard, and Potluck are all based on bidirectional synchronization, and all operate on the principle of reshaping.
Although Cambria definitions are much more specified, Wildcard uses the full JS language to apply the adapters to the page. Otherwise, site adapters and lenses share a common purpose. Additionally, both languages are deeply structured, and can be made to comply to schemas:
JSON-schema is a rich and common language for describing the shape of a JSON blob. No surprises.
HTML can be described using numerous schema languages, such as DTD - document type definitions. These are usually gnarly. The Cambria page has a (pessimistic) reference also to XSLT.
So, how do you properly consider, that - many HTML pages include embedded JSON. JSON can easily embed HTML. And these are only two of many schema-bound languages we may care to parse through. YAML and TOML are in the same mold. Markdown can include code samples in any language. And if you manage to approach plaintext documents using a Cambrian lens, perhaps you'd easily reproduce Potluck.
Here is a core issue I'd like to consider;
Maybe someday, a Cambrian lens could recognize meaningful shapes in numerous languages. In this case, do any of XML or JSON or YAML make sense to describe the lens any more? If a lens is responsible for parsing a source in one language, could the lens recognize a sub-measure of the source made up of an embedded language, and slice the pieces accordingly? How could a single lens description address changes bidirectionally (sourcemaps, maybe)? Can a logical calculation, applied to a shape mapped from any source, carry a conclusion through to the original source location (all signs say yes)? How could a schema language possibly map across numerous source languages?
Spurred by success, I proceeded to clone and upgrade gram:luck and gram:shape, alongside gram:card. All of these easily run in a local, offline mode, so I'm happy to head back to the sailboat and pick them to pieces as needed. gram:shape (
json-sheets
) required a small upgrade to run on Vite.Much obliged to Geoffrey, and much more to come!