Making 10M government PDF documents searchable – FlowingData

Government organizations love to distribute documents as PDF files. They are easy to forward and to print. The problem is when you want to find and access them later among millions of other files. GovScape, a research project between the University of Washington and Boston University, provides a search interface through the End of Term Web Archive’s 2020 crawl.

The code for GovScape is open source and available on GitHub. I have a feeling such a tool will grow more important going forward.

Source link

spot_imgspot_img

Subscribe

Related articles

Warm Spring Salad

I’m celebrating spring with this warm spring salad featuring...

Ferrari Design. Creative Journeys 2010-2025

The exhibition hosted at the Turin MAUTO (Museo Nazionale...

Venezuela calls Trump airspace closure warning ‘colonialist threat’

Venezuela has accused US President Donald Trump of making...

How Should I Store Sweet Potatoes?

Published Nov. 26, 2025Updated Nov. 26, 2025Shopping for sweet...
spot_imgspot_img