We have a client with a number of web information portals that needed to propagate information from various places to these portals as articles. This client wanted to avoid the process of going to the source of the information and manually copying the information into the destination web portal. Unfortunately there was little or no control over the information source and there was no API or web service interface to that information. Fortunately the information was accessible via the internet using a browser.
We opted for a customized automation robot to facilitate the information transfer. This automation robot took the form of an application written in the java programming language. Its foundation was an integration of a number open source packages used to manage the navigation and retrieval of information from web pages, information parsing and extraction, and database storage via the Java DB SQL database engine. On top of this was a custom user interface that seamlessly integrated everything into one application.
This customized application allowed this client to describe and set attributes about each information source and also describe the target web portal and set credentials for each. The automation robot then used a web screen scraping approach to gather the data from each information source and store those results in a combination SQL and file system repository.
Once the information was gathered in the repository, the application would then bulk publish this information to the various target information portals. Each information portal carried different information, so it was important that the automation robot keep source and target information so that it could properly apply the post to the correct portal.
On the portal we used the ASP.NET environment with a custom C# service listener to accept the incoming information and apply that to the database so it may be published into the portal pages.
At the completion of this project we reduced our clients work to a fraction of what is was when they were moving this information manually.
