scrum 101

scrumscrum methodology is quite popular and i’d like to dedicate this post to making some sense out of it for those of you who are not familiar with it. if you are a development team, scrum could be an interesting fit for you or at the very minimum something new to consider and evaluate.

scrum is a software development methodology that works along the lines of lean manufacturing or agile. to make sure we all speak the same language, agile is lean is scrum in a superficial or high level view point. if you are wondering on the origin of the name, scrum is borrowed from rugby where the players lock up and try to get a hold of the ball by passing it with their feet. small iterations. makes sense?

the methodology assists in defining the development life cycle and stages, the key players and roles and how responsibility is delegated. scrum also assists in figuring out how to stay on top of project and it’s progress, how to address and perform changes/enhancements to the development plan and how to deal with risks. taking a step forward, scrum can help lead development teams, be engaged in drawing conclusions and improve both product and process on regular basis.

if you were a scrum, the world is roughly divided into you and waterfall. “you” means agile methodologies such as kanban and XP. waterfall is a more sequential approach from the 70s that was (probably) influenced by traditional manufacturing, and is similar to the approach of building a house:

  • there are restrictions on the order of operations (one cannot lay the roof before foundation for example)
  • mistakes are freaking expensive so better get it right by carefully planning and quality measuring your work
  • much repetition (many doors, many windows) so consolidating tasks means efficiency

so with waterfall one first gather all the requirements for a product, then architect the solution, then dive into a detailed, technical design of each component, code it up, integrate, test test test, repair/fix and release.

with agile, developing software is more like designing a department store:

  • usually there are loose restrictions on the order of the tasks at hand
  • a wide array of features
  • a detailed and strict planning may fail. small incremental steps and proper adjustments moving forward works better (scrum anyone?)
  • centralizing tasks helps to a certain extent

scrum in action:

  • with agile the team goes on sprints (2w minimum for us) when each run gets us one step closer to our goal
  • effectiveness is key: how many features (stories) were coded into the system. less lines of code in general, more code that does what the end user really needs
  • no elaborate MRD/PRD. with agile one maintains a backlog and direct communication. we hold daily meetings, spring planning and we retrospectively learn from our mistakes and success.
  • flat hierarchy across the team and more responsibilities is handed off to the developers. the team self manages using a structured process
  • a team is comprised with complementing skills, so each team can get stuff done on it’s own accord
so agile is more of a philosophy than rules right?
  • every activity is time measured and they are prioritized from the most important to the least. when time runs out we are hopefully better off than before we started as the product work was done priorly to make sure the most important features are on the top of the list (i.e. backlog)
  • with agile the developers are encouraged to write only what is absolutely necessary and most probably we will revisit this code later on for changes and enhancement. this is where a thought through QA process is essential

think of scrum and agile as a framework for getting the job done. depending on the dynamics of your team and size of company agile maybe what you want to implement. i think it works very well for startups and collaborating with small teams when outsourcing projects. at the end of the day agile/waterfall are all ways to increase productivity and allow developers to make the best of their time.

good luck!

healthcare and big data

everywhere you turn people are talking about big data, hadoop and sharding. rightfully so. in today’s day and age managing a lot of data is not an easy task, as performance and scalability are key. traversing large data sets, dividing them into tiny sections and distributing the load among many machines (processors) is nothing new.

hadoop exists in order to solve specific problems and has emerged out of necessity. what hadoop does is provide the infrastructure to connect multiple (cheap) servers into a coherent environment with which high i/o and cpu problems (algorithms) are solved.

it all started in 2004 when doug cutting of google released his document indexing project called lucene and decided to have it possible to achieve the same goals in distributed environment. hadoop BTW is his sun’s yellow elephant toy. in 2006 yahoo hired doug to improve his project so it can index the entire internet and made the project open source. that day marked the start of the revolution.

at it’s core hadoop includes two projects: one for distributed storage and one for distributed computing. around those two projects a vast of projects have evolved (and still are).

HDFS: hadoop distributed file system
this file system is designed to store large files and enable large and effective r/w. this is done by dividing the file into sizable chunks, while each chunk is normally stored on 3 nodes which can be anywhere. there is a “name node” that runs the mapping between a document and it’s constituent pieces and the data nodes on which they are stored on.

an API to write programs that will run in a parallel.  the developer really needs to write two simple functions: map and reduce that handle a single document (i.e. element of data) on multiple machines, when each node is responsible for the timing, handling errors and failures (network, i/o, etc). this allows for simple parallel batching, where a “job tracker” synchronizes the execution of the bach processes, when each one batch is sub divided into smaller tasks which are handled by the “task trackers”.

over time yahoo and facbook (to mention a few) wrote their own drivers over HDFS and mapReduce and have shared their work with the community. so hadoop is a code name for a set of technologies who harnesses the computing power of many machines to perform simple tasks in parallel. hadoop emerged from the world of un structured data where hundreds of millions of pages are analyzed. today big data is being implemented and researched in every facet of the economy, including healthcare.

why we use mongo DB

from, mongo is a scalable, high performance, open source, schema-free, document oriented database. so the one size fits all philosophy doesn’t work anymore, does it? non relational databases scale horizontally much better. just add another machine and you are good to go, and these days where big data is a big deal – speed, performance, flexibility and scalability are the names of the game. think about it… no schema, no concern with transactions. this is the commoditization of databases. what mongo does is try to perform as a key-value store with the functionality of RDBMS.

speaking of “traditional” database capabilities, mongo can index and has failover/replication support. data is stored as documents in binary JSON format. yes. mongo “gets” JSON out of the box. is your JSON valid? good. you can now import and query it. and no schema means no more ‘alter table’ crap, and the query syntax  is java script based and you can nest your queries as much as you want. moving right along, gridFS can store large binary objects efficiently . images. videos. whatever. just throw it in there.

the documents are just like records. only mongo has them as JSON binary objects. collections are your old school tables if you will. when you query mongo you get a cursor back and not a record per se, and you iterate over the set of result set like a champ. you guessed it. no more loading everything to memory – just want you need. this is a big victory for the performance gods. mongo is wonderful to perform analysis  as it is a data warehouse. dump in your JSON and analyze the hell out of the data.

mongo is not good handling transactions nor maintain the integrity of relationship between data. you can find out more at