Unknown's avatar

All posts by James Bray

I'm a Chicago-based software architect. My areas of interest are cloud enterprise solutions, healthcare technology, domain-driven design, and mobile applications.

Freelancersrequirementswebsite

Thinking Through the Website Ask

As a technical consultant with technical and design expertise, I often see requests for proposals for websites that state “fully specified” or “easy to develop”.  When I look at these requests, more times than not there are little or no true specifications and nothing to indicate the website will be truly easy to develop.  Many individuals or business entities looking for freelancers to build a website for them have not thought through what they want and how to get it in detail.  The requests for proposals are often a collection of ideas rather than a coherent vision of what is ultimately desired.  Many prospective clients need as many business consulting services as they need technical and design services, so I often find myself educating a potential client on how to focus on what the client wants and how to obtain it as an end product.

There is a temptation as consultants to say yes to a project and worry about the details later.  This is almost always a mistake.  Several key steps should be taken when working with a client to develop a website.

Take the Time to Define the MVP

One of the first things I try to determine is what the client is attempting to achieve by creating a website.

  • Once the objectives and key results (OKRs) have been established, determine how a website will help the client achieve these OKRs.
  • Ask probing questions to surface any key assumptions that the client has made regarding the website and what is expected.
  • Work with the client to determine the minimum features the website should have in order to meet the OKRs.  In industry parlance, this is the minimum viable product (MVP).  This will help minimize the time to market, assuming that is important.
  • If the client has a particular consumer, user, or audience in mind for the website, determine who this is or who they are.  If possible, develop some personas that can be used to test some assumptions during the design and development process.

Try to keep the MVP definition step short and sweet.  This step should be thorough, but it should also not bog down into analysis paralysis.  I typically spend anywhere from 1 to 4 days working with the client on this, depending on the complexity and size of the project.

Website Development Should be Iterative

There are many strategies for developing a website.  I prefer an iterative approach.  Assuming that we have defined the MVP, rapidly build out the website, and begin testing assumptions that have been made.  Have the client select some test users to interact with early iterations of the website.  I recommend doing this at least once before moving to a full production launch of the site.

  • Have people who match the personas use the website.
  • Get feedback and validate the assumptions made.
  • If some assumptions are revealed to be true, continue along the same path.
  • If some assumptions are revealed to be false, evaluate and pivot if necessary.

Repeat the above steps until the client feels comfortable that the website will meet their OKRs that were defined earlier.

Think About Day-2 Operations

One thing that often gets overlooked in website development for non-enterprise-level websites is day-2 operations.  Day-2 operations are all of the activities that are necessary to maintain a website once it has been launched.  Websites don’t maintain themselves, and I’ve seen good sites go stale because they aren’t maintained.  This ultimately can negate the value of investing in the website in the first place.

  • Does the client expect you to maintain the website once it has been delivered?  If so, is there a service contract between you and the client specifying the service agreement?
  • Is an operations or administration manual included in the deliverables for the client?  I have found that including an easy-to-understand document on how to maintain the website makes for happier clients and results in fewer requests for free support once the site has been delivered.

Start with a Soft Launch

I prefer a soft launch rather than an initial big-bang launch to introduce a website to its intended audience.  I define a soft launch as either a non-public launch of the website or a launch to a subset of the intended target audience.  A soft launch allows the new website to go through a relatively low-risk shakeout phase.  If any last-minute issues are discovered, those issues can be fixed without impacting the client’s entire audience.

Conclusion

The things discussed in this article are in no way comprehensive. Many other factors go into preparing to develop a website, but if the items mentioned in this article are considered, the entire process will be easier and the chances of success will increase.

Healthcare

Information Technology in American Healthcare: A Brief Retrospective and What the 2020’s May Bring

The American healthcare industry is one of the largest and most complex in the world.  Therapeu\tic and diagnostic  advances in American healthcare often filter out to the rest of the world and become a standard.  The irony in this is that technology used in the American healthcare industry has traditionally lagged behind that of other industries such as financial services, manufacturing, telecommunications, etc, since the late 20th Century into the 21st.

A Brief History

Health technology in general dates back to ancient times and is as old as civilization.

The healthcare industry has been leveraging information technology for as long as other major industries.  During the later half of the 20th century medical advancements were steady and methodical with fewer of the breakthrough discoveries of the early 20th century such as the development of therapeutic antibiotics (Penicillin).

Early Medical Technology Advancements

While it is probably impossible to know definitively when the first artifact of medical technology was created, we have examples of early medical technology dating back 3000 years.

The Cairo Toe

One of the earliest known artifacts of medical technology is arguably the “Cairo toe”, which was discovered on the mummy of a woman who lived sometime between 950 BC and 710 BC.  This prosthetic toe is made of wood and leather and appears to have been hand crafted to fit the woman who had it.  

The Cairo Toe ( University of Basel, LHTT. Image: Matjaž Kačičnik)

There are earlier examples of what appear to be artificial body parts, but these earlier artifacts appear to be for aesthetic purposes if they were used at all.  The designs of these earlier artifacts seem to make them impractical for frequent use, due to weight, lack of comfort, and general awkwardness of these artifacts.

Medical technology continued to develop over the centuries with the development of the first stethoscope in 1816 and the first medical use of X-rays in 1895.  These are only a few of the early technical achievements.  There are many others, but a comprehensive discussion of the history of early medical technology is beyond the scope of this article.

Medical Technology in the 20th Century

The twentieth century saw an explosion in medical advances and in the use of technology in medicine.

Dr. Alexander Fleming, a Professor of Bacteriology, at St. Marys Hospital in London, UK, discovered the antibiotic characteristics of a rare strain of the Penicillium notatum mold in 1938 and developed the first true modern antibiotic, Penicillin.

In 1953, a team that included Dr. James Watson, Dr. Francis Crick, and Dr. Rosalind Elsie Franklin discovered that the DNA molecule has a double helix structure.  Their discovery built on the work of the Swiss chemist, Friedrich Miescher, in the 1860s.  The work of Watson and Crick would provide a portion of the foundation for future work on the Human Genome Project in the late 20th century.

The Human Genome Project Is probably one of the most ambitious and pivotal efforts in the history of scientific and medical research.  The project started on October 1, 1990 and was completed in April, 2003.  The results were a genetic blueprint for a human being.  The project led to significant advances in technologies used to sequence DNA.

The first printout of the human genome to be presented as a series of books, displayed in the ‘Medicine Now’ room at the Wellcome Collection, London, Russ London

Early 21st Century Medical Advancements

The first two decades of the 21st century saw many innovations.  These include the introduction of mobile technology in the healthcare space.

2020: A Pivotal Year in Health Technology

2020 forced change on many aspects of society with healthcare being one of the key areas where major changes took place or were accelerated.

The 2020’s: An Inflection Point

The COVID-19 pandemic accelerated some trends that started prior to this event.  Although partnerships between healthcare and technology companies are nothing new (Syntex Pharmaceuticals (Roche) and Varian partnered to form Syva back in the 1980’s), the scope and urgency of these partnerships greatly increased in 2020.

Healthcare Supply Chain Breakdown and Failures

Early in the pandemic, there were shortages of PPE (Personal Protective Equipment) which greatly increased the risk to medical professionals and impeded care to the infected.

Accelerated Research and Development and Path to Production

The urgency of the pandemic necessitated an acceleration in the development of diagnostic and treatment tools to address the healthcare crisis.

The Road Ahead

Technology offers great potential benefits to the healthcare industry and enormous challenges.  Pharmaceutical companies, medical device companies, and healthcare providers have seen what is possible based on the advances made during the COVID-19 pandemic.

Telemedicine

The remote delivery of medical care via telemedicine took off in 2020 in a way that was not expected for at least another 5 years.  Going forward, telemedicine is viewed as a cost effective and convenient means of delivering healthcare services to patients under certain circumstances.

The delivery of basic medical diagnostic tests and monitoring via mobile devices took off during the pandemic of 2020.  The technology platform foundation was laid for expanded use of this technology more extensively in the 2020’s.  The general public has gained a greater acceptance of receiving medical services via their personal mobile devices.  Software as a medical device on personal mobile devices is likely to become common by the end of this decade.  Mobile devices, including wearables are likely to include biometric sensor technology to allow for more extensive use of remote diagnostic tools.

Machine Learning in Healthcare

Machine learning tools are already used in the healthcare space and this is likely to expand greatly over the next 10 years.  Advances in big data and data analytics have laid the foundation for the use of machine learning to assist medical professionals in medical diagnostics and recommending proper medical treatment protocols for a given condition in a patient.

The Open Source Model in Healthcare

The use of open platforms and tools has the potential to increase innovation and reduce costs in the healthcare space.  The success of the open source model in healthcare will greatly depend on an evolution of the culture in this space toward less proprietary solutions.

End to End Preparedness

There is a general consensus that the world was ill-prepared for the impact of COVID-19 and there is a recognition that there is a need for better preparedness.  There is not currently agreement on what future preparedness for a global medical crisis should be.  Some general recommendations include hardening the supply chain for medical supplies and raw inputs, having contingency plans for remote collaboration and research, and creating disaster recovery strategies.

Cloud Computing in Healthcare

Cloud technology, as in other areas, will be one of the foundation technologies that underpins healthcare in the coming years and perhaps decades.  This has and will continue to be implemented in various forms including Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).


There have been several drivers for the adoption of cloud technologies by the healthcare sector.  Fluctuations in demand and the cost effectiveness of meeting these changing demands has been one of these drivers.  In this respect, the healthcare industry is no different than any other industry that faced similar demand fluctuations.  The personalization of healthcare has been another driver of its adoption.  Advances in cloud technology and data analytics have allowed healthcare providers to deliver personalized care to patients more cost effectively than would otherwise be possible.

Describe general architecture of Mobile Cloud Computing, Dinhthaihoang

References

Images

  1. The Cairo Toe ( University of Basel, LHTT. Image: Matjaž Kačičnik) (https://www.smithsonianmag.com/smart-news/study-reveals-secrets-ancient-cairo-toe-180963783/
  2. The first printout of the human genome to be presented as a series of books, displayed in the ‘Medicine Now’ room at the Wellcome Collection, London. The 3.4 billion units of DNA code are transcribed into more than a hundred volumes, each a thousand pages long, in type so small as to be barely legible. Photo by Russ London (https://commons.wikimedia.org/wiki/File:Wellcome_genome_bookcase.png
  3. Describe general architecture of Mobile Cloud Computing. BTS: Base Transceiver Station AAA: Network Authentication, Authorization, and Accounting HA: Home Agent, Dinhthaihoang (https://en.wikipedia.org/wiki/Mobile_cloud_computing#/media/File:Mobile_Cloud_Architecture.jpg

Glossary

  • Health Level 7 (HL7): standards for electronic exchange of clinical, financial, and administrative information among health care oriented computer systems.

Health Insurance Portability and Accountability Act of 1996 (HIPAA):  a federal law that required the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. The US Department of Health and Human Services (HHS) issued the HIPAA Privacy Rule to implement the requirements of HIPAA. The HIPAA Security Rule protects a subset of information covered by the Privacy Rule.

Freelancers

Having Clear Goals Increases the Chance of Success When Hiring Freelancers

I have worked in large enterprises and as a freelancer for small businesses.  One key ingredient for success in both of these environments is having clear goals for the project.

 

Large enterprises generally have extensive resources and can have business stakeholders, product managers, engineers, designers, and other professionals work on a project.  These project team members have the expertise to ask the right questions before a project begins to increase the likelihood of success.

 

Small businesses and individuals who hire freelancers often have a general idea of what they want, but no clear goal that is actionable.  This can be true of large enterprise projects, also but the team of professionals present on most large projects can quickly ask the right questions to make the goals and objectives of the project very clear before development begins.  In the case of a small business there may be only one or at most two people who can fulfill all of these roles. This can lead to a project starting with only vague goals and becoming a prototyping exercise rather than a project that delivers production ready code.  Prototyping and POC (proof-of-concept) projects are perfectly fine, as long as the person(s) hiring the freelancer expects a prototype at the end of the project. This is often not the expectation.

 

Some critical questions to consider before hiring a freelancer include the following:

 

  1. What is the goal of the project?
  2. What is the expected deliverable?
  3. What assets or resources will be provided to the freelancer?

 

Let’s take a closer look at these questions.

What is the goal of the project

This is the single most important question of any project.  Here are some questions that should be asked about the project goal.

  1. Is the goal of the project to produce a part of a larger project?
  2. Is the goal to produce a finished product that will be used by consumers?
  3. Is the goal to produce something that will be monetized?
  4. Is the goal to increase brand recognition?

 

Some basic steps that should be taken are:

  1. Identify the stakeholders.  That may be just you, but if there are others, make sure they are included in some of the early discussions.  Make sure all stakeholders are in alignment regarding the project goal(s).
  2. Identify any known risks and let the freelancer know early.  Goals may need to be adjusted based on risks. Identify these if possible.

What is the expected deliverable

This may seem like it should be obvious to the business or individual hiring a freelancer, but often it is not clear.  I have often heard clients say once we have started a project, that “..I don’t know what I want, but I’ll know it when I see it…”.  In cases such as this, it is probably in the client’s interest to think through what they actually expect to be delivered. This may require them to create some prototypes or hire someone to create prototypes and concept applications for them.  This can help in the journey to defining a clear deliverable.

What assets or resources will be provided to the freelancer

What will the freelancer be starting with when taking on your project?  If the project includes a front-end, will UI assets be provided? If the project requires some sort of remote data storage, like a dating app or financial app, will that be provided upfront?  If the freelancer is expected to provide these things for the project, this will increase the scope of the work, which translates into higher costs and longer timelines.

 

Plan Template

 

Key Stakeholders
Key Goals
Anti-Goals
Roles and Responsibilities
Scope of work
How will project be managed
Risks
Initial Backlog or Stories/Tasks

 

Free and Inexpensive Resources

There are many free and inexpensive resources for managing a software project.  If you hire freelancers on sites such as Upwork, Fiverr, etc… you can use the free tools they provide on their respective platforms.  I describe some free and low cost third-party tools below.

Project Story/Task Management

Trello is a very flexible tool for managing user stories and streams of work.  It can be used free of charge for small projects.

Trello Board – https://trello.com

Source Code Repository

The source code repository is where code should be kept during the development process.  This allows you and the freelancer to have access to the code at any point and time during the project.  Some Git repository options are listed below:

 

Communication

Slack has become the preferred communication tool in the development community in recent years.  Slack offers free accounts for small teams.

Slack – https://slack.com/

 

Zoom is the tool of choice for quick developer real-time collaboration.  I have found it vastly superior to other collaboration tools on the market.  Zoom allows small teams to use a free account for 20 minute collaboration conferences.  It includes text chat, audio, video, and screen sharing.

Zoom – https://zoom.us/

Other Resources

The resources listed above should be more than adequate for almost any small business software development project.  Other tools that can be used include Jenkins, CircleCI, or ConcourseCI for continuous development and continuous integration and delivery.

If you have more than one freelancer working on your team, you may want to consider doing weekly retros using a tool like Funretro to see what worked, what didn’t, and what should change week to week.

ConferencesSpringOne 2017

Main Stage – Pivotal Cloud Foundry 2.0

Rob Mee opened the Main Stage talks on Tuesday, December 5, 2017, by letting the audience know how far Pivotal had progressed over the past year since the last SpringOne Platform Conference in 2016 that was held in Las Vegas.  He introduced Onsi Fakhouri, who is the head of R&D at Pivotal.

Onsi emphasized the culture at Pivotal which values and encourages rapid velocity in innovation and development.  Some of the key announcements included the introduction of the Spring Reactor technologies which allow for the use of non-blocking asynchronous patterns into the Spring ecosystem.  Another key announcement is the update of the the Spring Tool Suite to version 4.  

A number of key partnerships were announced including ones with IBM and Microsoft.  IBM Liberty WebSphere will now be supported as a first class citizen in PCF.  Also, as part of the multi-cloud support that is core to Pivotal Cloud Foundry (PCF), the Azure stack will be supported as a first class citizen in the next major PCF release.  Microsoft had a running Azure stack on the show floor for attendees to examine and try out while at the show.

Another major announcement is the support of Kubernetes on the PCF platform.  Kubernetes will work in conjunction with BOSH as part of the new PKS offering.

In all, there were over 20 product and service announcements during this keynote.  For the full keynote, please see the YouTube link below.
The talk can be viewed on YouTube at https://youtu.be/_uB5bBsMZIk.

 

ConferencesSpringOne 2017

SpringOne 2017 Conference Open Discussion

Today is the official start of the SpringOne 2017 conference in San Francisco at the Moscone Convention Center.  At 2:00 PM there was an open discussion on topics related to the Spring Framework, cloud technologies, and DevOps topics.  I will post any public announcements along with my comments on my blog as usual when I attend these events.  One disclaimer I will make is I am now a Pivotal employee.  Pivotal is the primary sponsor of Spring One.  I hope all of my readers enjoy my commentary on this show as much as I am enjoying this conference.

Conferences

Google Cloud Summit Chicago 2017 – Cloud Spanner Database

Spanner is Google’s new DBaaS offering.  Google touts it as having the best features of both a relational database and a document or NoSQL database.  So, why did Google create Spanner?  Google needed:

  • Horizontally scale
  • ACID transactions
  • No downtime
  • Automatic sharding
  • Seamless replication

Cloud Spanner is Google’s mission-critical relational database service.  It was originally built to run Google Adwords internally, but is now exposed as a public service.  It is a multi-regional database and can span millions of nodes.

Open Standards

  1. Standard SQL (ANSI 2011)
  2. Encryption, Audit logging, IAM
  3. Client libraries (Java, Python, Go, Node.js)
  4. JDBC driver

Architecture

Spanner is provisioned in instances.  The instances exist in different zones.  This architecture allows for high availability.  The customer can choose which regions the database instances are placed in.  Writes to the database are synchronous and are replicated across nodes.

Spanner supports an interleave data layout.  This specifies that data should be written in close proximity on disk.  The result is much better read performance.  Spanner is designed for large amounts of data.  It is not as efficient for small data sets.

Conferences

Google Cloud Summit 2017 Keynote

This is the first Google Cloud Summit in Chicago and it will be in some ways a coming out event for Google Cloud.  The keynote speaker is Scott McIntyre, Director Google Cloud.  There have been 500+ releases in the past 6 months and 6.5 million businesses use GCP today.  In addition to the customer base Google has build an ecosystem of business partners for their Cloud Platform.  Google believes that all companies either are or will become data companies.  The transportation industry provides a dramatic example of this.

The advantages of Google Cloud include optimizing business operations by hosting business infrastructure services in the cloud.  Collaboration is another area that Google touts as an advantage of their cloud services.  The acceleration of business is also one of their advantages.  They follow a philosophy of openness, which goes beyond just open source software.

The focus of the Summit is to promote the Google Cloud Platform to the Chicago and larger Midwest business community.

Ulku Rowe, Technical Director, Financial Services Office

The 5 Drivers of cloud decision are:

  • Reliability
  • Security
  • Excellent Support
  • Performance
  • Cost Effectiveness

Serve over 1 billion end users today.  40% of the world’s internet traffic goes through Google’s network.

Google also provides a private, ultra-fast backbone, which it touts as more secure than many public access points.  Google provides a “Layered Defense in Depth Security”.  They follow a security paradigm of least trust.  Google introduced the Titan chip, which is used to secure hardware.

Security innovation

  • Identity-aware proxy – application level security, which has more granular security than a corporate VPN
  • Data loss prevention API

The security layer is built into every layer including:

  • GCP
  • G Suite
  • Chrome and Android – scan 6 billion mobile apps to make sure they are not infected with malware

Roberto Bayardo, Distinguished Software Engineer

Worked on machine learning within Google.  There is a myth of breakthroughs is that they happen in isolation.  With G Suite, Google is dedicated to saving businesses time via collaboration.  Files in G Suite act as conversations.  Multiple people can collaborate on a single document in real time.

Smart reply in Gmail uses machine learning to automatically reply to emails based upon the user’s observed behavior.  A demo was performed in which Google sheets used natural language processing to understand a question a user asked and provide an answer in real time.

 

Andrew Lewis, Whirlpool, Senior Manager

Whirlpool went live with G Suite about 3 or 4 years ago and that was their first foray into the cloud.  They built a team called “winning workplace” made of several department representatives.  Google Hangouts was one technology that made a huge difference at Whirlpool.  Before the use of Hangouts, video conference capabilities were limited.  Using the Google Cloud has changed the business conversation at Whirlpool.

Scott McIntyre – Summary

Scott McIntyre talked about the tools make available on Google Cloud Platform.  The App Maker allows developers to write an app once and have it work across multiple platforms.  Cloudsearch is a tool which allows users to search for content across all of their G Suite content.  G Suite uses simple controls to allow for easy administration of the tools.

Miles Ward, Director, Solutions Google Cloud

There has been an acceleration of businesses trying to use the cloud to increase productivity.  Google has worked to democratize data within organizations to increase collaboration and innovation.  Google develops tools and platforms to facilitate this and open sources much of this technology.

Cloud Spanner

This is the first horizontally scalable relational database.  Cost is calculated in real time as the database architecture is specified.  Databases can be deployed within seconds.  The database appears to be lightning fast.  A query was done on several terabytes of data  within less than a second.

Cloud AI

Google provides large vendor free datasets that companies can use.  Google also provides machine learning training to allow customers to come up to speed rapidly.  Google provides pre-trained models or allows customers to train their own models.  Google has provided the open source TensorFlow machine learning platform.

Demo

Digital Intelligence API – can analyze a video and provide context and relevance data on the video.  The API can also scan a video catalog and retrieve relevant video content.

The platform that Google provides allows developers to focus on just designing and developing code rather than getting distracted by provisioning and managing infrastructure.

Bradley Burke, CTO, Network Insights

They have a large streaming analytics platform as well as a large data platform.  The platform is enabled by Google Cloud Platform.  The Google Cloud Platform allowed the organization to scale their business.  Their analysts can now write queries against Big Query and has brought data science to the masses.

Kris Baritt – Technical Director, Office of the CTO

Google spends significant resources on the partnerships it has with its customers.  One goal of the relationship is “getting out of the software jail”.  There should be a shared success model.  The partnership should be a commitment, not a mandatory sentence.  There should be flexible deployment models and flexible support models.

There are 10+ years of open source projects in:

  • Linux
  • Python
  • C++
  • Git
  • Kubernetes – tool to manage containers

Google does not dictate VM configurations.  Customers can configure their VM’s on the Google Cloud.  Google has per second billing.

Google announced “committed-use discounts” yesterday.  On average customers save 60% savings and 0$ startup costs vs. on-premise infrastructure.

Ratnakar Lavu, Kohl’s, CTO

Kohls is a brick and mortar retailer with a growing digital presence.  They have two data centers with rack-and-stack servers.  They are moving to the cloud in order to scale for peak periods such as the holiday season.  Kohl’s selected Google Cloud Platform because their platform is secure, it is flexible, has low latency, and Google is innovative.  The machine learning platform was attractive to Kohl’s.

Kohl’s developers can now spin up servers on the fly, although this has increased costs, so Koh’s is trying to manage that with Google.

Scott McIntyre – Closing

Scott reiterated the primary themes that were highlighted in the keynote.  Overall this was a solid keynote for a technical crowd.

Conferences

MongoDB World 2017: Building Micro-Services Based ERP System

Jerry M Reghumadh of Capiot did a talk on building micro-services from the ground up.  The legacy system that his group replaced was monolithic and rigid.  The solution the Capiot team proposed to the client placed each component of the ERP system into its own atomic component.  Everything on the platform that was built was an API.  These API’s were very “chatty”.

The engineering decisions that were made included the choice of NodeJS and MongoDB as the base technologies for this platform.  NodeJS was selected in part, because of its small footprint.  This lowered the barrier to entry for the application.  Java was considered, but it was too heavy for the needs of the project.  MongoDB was selected for the data persistence layer because it saves data as documents and it did not require the marshaling and unmarshaling of data.  MongoDB also allowed the implementation team to use a flexible schema.  MongoDB offered greater ease of clustering and sharding versus other available options for this project.  This allowed the developers to implement this without relying on a dedicated database administrator.

The technology stack included:

  • NodeJS
  • ExpressJS
  • Swagger.io
  • Mongoose.js
  • Passport.js

The team implemented a governance model that forced any exposed API to be exposed in Swagger.  This prevented the proliferation of “rogue” API’s.  Any API not exposed in swagger would not work properly in the system.  Mongoose allowed the team to enforce a schema.

Conferences

MongoDB World 2017: Using R for Advanced Analytics with MongoDB

Jane Uyvova gave a talk on analytics using MongoDB and the R statistical programming.  She began her talk by discussing analytics versus data insight.  R has become a standard for analyzing data due to its open source nature and easy licensing requirements versus some legacy tools, such as SAS or SPSS.

Use Cases

  • Churn Analysis
  • Fraud Detection
  • Sentiment Analysis
  • Genomics

Use Case 1: Genomics

The human genome consists of billions of gene pairs.  The dataset that was used came from HAPMAP.

  • HapMap3 was the dataset
  • Bioconductor was the R library that was used for this analysis
  • R-Studio was used for the analysis
  • MongoLite connector

The MongoDB data aggregation framework was used to aggregate the data by region.

In doing genomic analysis, schema design becomes important in making the analysis easier and more effective.

Use Case 2:  Vehicle Situational Awareness

  • Chicago open data was used as the dataset
  • The dataset was loaded into MongoDB and Compass was used for the initial analysis
  • R was used to analyze the data.  R was used to extract data for a density plot (GG-Plot)
  • The MongoDB flexible schema allows a wide variety of data to be included in the analysis

One issue that must be addressed is scalability.  Since R is a single-threaded application, data scientists come up against data volume constraints.  One solution to this is to use Spark to parallelize and scale R.

A MongoDB/Spark architecture can include an operational component.  This operational component consists of an application cube and a MongoDB driver.  The data management component consists of the MongoDB cluster.

Conferences

MongoDB: Migrating from MongoDB on EC2 to Atlas

Atlas was introduced by MongoDB as their SAAS offering for MongoDB.  Atlas allows administrators, developers, and managers to deploy a complete MongoDB cluster in a matter of minutes.  Some basic requirements for using Atlas include:

  • Atlas requires SSL
  • Set up AWS VPC peering
  • VPN and Security Setup
  • Use Amazon DNS (when on AWS)

The preparation work that must be done includes:

  • Picking a network CIDR that won’t collide with your networks
  • Need to use MongoDB 3.x engine using WiredTiger
  • Test on replicas using testing/staging environments

Atlas supports the live migration of data from an EC2 instance.  The Mongo Migration Tool or MongoMirror can be used to migrate the data.