Law Consulting company based in South America. Client automate, manage, classify and store files of court cases, documents and contracts of all kinds via AI algorithms.
Created a distributed system architecture with Linux nodes and dynamic pipeline which makes managing high peaks and set priorities possible.
Created an algorithm, which scrapes new files immediately during the daytime based on traffic and makes massive updates during the nighttime.
Implemented proxies and AI technologies used to overcome bot protection and process 14.8 million pages daily. Daily we download about 14 Gb of important data.
Created a cloud SQL database with daily dumps to Elasticsearch to keep data we use; files directly uploaded to Elasticsearch.