it’s been WAY TOO LONG since I last posted. Great! I am back! Finally I have cleared all my modules at school and I have finished my Major Presentation 2 days ago.
This is what I felt about the entire project. Firstly, it was a great experience working at Hitachi Ltd at their HQ. I have learnt so much during this period of time. I always wanted to develop Ruby applications yet because of all the wonderful technologies out there, I never had the opportunity to develop Ruby. However in Japan, because of the environment and colleagues are all using Ruby, it gave me the enthusiasm to learn Ruby.
Developing Project Lifoge was a great challenge. It wasn’t built on top of any familiar relational scheme based database like MySQL, or even OracleDB, instead we were developing and testing on top of Hitachi’s cutting edge cloud database they were creating based on Key-Vale Store (KVS). Indeed Key-Value Store, also known as Binary Large OBjects (BLOB) was much more flexible and scalable, but in Hitachi’s instance because it was a mere prototype, all those advantages were not so obvious yet.
I was working with two other students from Financial Business Informatics. It was a painful process because they were non-developers. Many ideas, concepts and IT principles were hard to put across because they will never fully understand the true meaning behind every decision. However, they also taught me quite a bit. That is, to put complex technical knowledge and terms into the simplest form for customers to understand. It made me look closer into what end-users really need and putting these complex ideas, technical terms, principles into layman terms so that any user will understand easily.
I developed the backend framework using Ruby to allow the other components of the system to interact with using REST API style. The framework then interacts back with the database acting as a domain controller or the middleman. In this entire project, I learnt not only developing in Ruby and building the API Framework for developers to easily interact with, I learnt to develop the security mechanism from scratch and at the same time developing a dynamic search engine to cater for our system.
It was kind of tough as well for the search engine. Instructions were not made clearly when we in Japan, and hence the instructions that was first given was that I could develop my own search engine to cater to the system. However, in January, Hitachi wanted us to test the search engine developed by AStar.
The search engine developed by AStar was good in the sense that indexing was used, however, why index the search of life logs of Facebook and Twitter where data is ever growing? Why index when we are unable to weigh which data is heavier if Facebook results weigh more than Twitter results and vice versa. Besides that, the concept they used to return the results were inefficient.
Why? First, POST the query defining if it’s equal, partial match to the search server. Search server will search all the documents, then index them, then it returns a location which stores all the keys to the results based on index. Then, with that location, GET will call and receive all the keys that are stored in it. Then GET each individual key to get the JSON values in it, before returning the values to the user. What search is this? 4 steps to get my results? CPU bandwidth increase, memory usage and resource usage increases.
Furthermore the reason why I couldn’t test the search was partially because there were some bugs that caused the server to crash. I am hoping that at least the search when released will be fully optimized for efficiency and speed. Speed as one of the quality metrics should be highly considered, reliable and accurate return of results should be considered as well.
In my search algorithm, POST query to the search API. Equal, partial or even phrase match need not be defined because search engine will handle the search query automatically without user/developer to define which makes it too much a hassle. Search algorithm will search all documents, rank it based on relevance of which data returns relevant results will be placed HIGH priority, before returning the entire results to the users. Index is based on hot and cold index where the best results are defined as hot index, and worst results are defined as cold index.
Because every user is part of a social network service, every social network service has a life log, every life log has a unique content, and every content is part of the query, hence, STATIC ranking works badly. Instead, I used the concept of a needle in the haystack.
Results will be return the location uri of the key, the complete JSON value of the key, the relevance weighting of the particular keyword at it’s location. Besides that, all results will be stored in caching engine to return faster results in a split of a second. In just 1 step, the developers will need to only POST the query to the search API, and the framework handles it all before returning the search results.
All in all, I do feel that this project was a great success. There are definitely more improvements to be done and security implementations. I see potential in this system, but how does the public sees it?