February 2012 GTUG Meet
March 2, 2012 Leave a comment
There were two talks at the February session (we were supposed to have three, but Mark Ahern was a bit under the weather and couldn’t make it).
First up, Trevor Parsons of LogEntries gave a talk on issues they had to take into account when building their logging system. Trev opened by talking a little about the large amounts of data generated by logging systems and the issues diagnosing problems within a large body of log data: LogEntries is providing a solution in this space. After motivating the system, Trev proceeded to descrie some key points in their system design bearing in mind the fallacies of distributed computing which are often erroneously assumed by application developers. Trev noted that in the world of cloud computing, some of these issues are even more acute as there is a greater dependency on multiple services providers who have limits on the reliability of their systems. Trev described the system they had built and how they had to overcome reliability issues, relating to the distributed file system that they used to store their log information. Standard solutions such as Hadoop and Cassandra could not meet their needs and they ended up developing their own. Their system currently scales to handle billions of log messages per day, but they expect to push that up a few more orders of magnitude in the coming months. [Slides]
After Trev, Paul Phillips from scrazzl gave a talk on scrazzl, the problem that they solve and the technology they have built. The problem that scrazzl focus on is to identify when particular products, items or processes are mentioned within research literature: this information is useful to the vendors of those products who typically have little understanding of the use of their products within this community. To solve the problem, the scrazzl team has built a system which can analyze a corpus of research literature, index it and generate different types of analytics relating to the data. Paul described all of the different components of their system and talked through some of the motivations for their design decisions; they have quite extensive experience with Apache Solr to analyse and index the corpus, mongoDB for storing the outputs of the analysis in general, but analytics information in particular. Paul also talked about deployment, configuration and scaling issues and the tack the scrazzl team take on these. All in all, a great overview of a complex application with a lot of moving parts. [Slides]
Next month, the talks session will have a security flavour, with a talk from Fabio Cerullo will be talking about the 10 biggest threats to web application security which arises from a considerable body of work within OWASP. There will also be a talk on Ethical Hacking. Apart from the talks session, there will be an Apps Script Hackathon on March 13th – follow the mailing list for more info.