Top 10 Big Data Technologies to Start Adopting Today!

2016-08-09 12:13:14

Big Data is exploding and new projects are springing up daily from companies all over the world.The good news is that all the technology is open source and available for you to start adopting today.Hadoop - Solid, enterprise strength and the basis for everything else. You need YARN and HDFS and the infrastructure from Hadoop to be your primary data store and run your key big data servers and applications.Spark - Easy to use, supporting all the important Big Data Languages (Scala, Python, Java, R), a huge ecosystem, growing quickly, easy microbatching/batching/SQL support. This is another no-brainer.   NiFi - The tool out of NSA that allows for easy data ingest, store and processing from so many sources with minimal coding and a slick UI.   Dozens of sources from social media, JMS, NoSQL, SQL, Rest/JSON Feeds, AMQP, SQS, FTP, Flume, ElasticSearch, S3, MongoDB, Splunk, Email, HBase, Hive, HDFS, Azure Event Hub, Kafka and more. If there isn't a source or sink you need, it's straight forward Java code to write your own Processor for that. Another great Apache project in your tool box.  This is the Swiss Army Knife of Big Data tools.Apache Hive 2.1 - Apache Hive has been the SQL solution on Hadoop forever. With the latest release, performance and feature enhancement keep Hive as the solution for SQL on Big Data.Kafka - The choice for asynchronous, distributed messaging between Big Data systems. It comes baked into most stacks.   From Spark to NiFi to third party tools to Java to Scala, it is a great glue between systems. This needs to be in your stack.Phoenix - HBase - BigTable for Open Source with tons of companies working on HBase and making it scale huge. NoSQL backed by HDFS and well integrated with all the tools. The addition of the ever building Phoenix on HBase is making this the go-to for NoSQL.    This adds SQL, JDBC, OLTP, and operational analytics to HBase.Zeppelin -  Easy, integrated notebook tool for working with Hive, Spark, SQL, Shell, Scala, Python and a ton of other data exploration and machine learning tools. It's very easy to work with and a great way to explore and query data. This tool is gaining in support and features. They just need to up their charting and mapping.Sparkling Water - H2O fills the gap in Spark's Machine Learning and just works. It does all the machine learning you need.Apache Beam - is the unified framework for data processing pipeline development in Java. This allows you to support Spark and Flink as well. Other frameworks will come online, and you won't have to learn too many frameworks.Stanford CoreNLP - Natural Language Processing is huge and just growing more. Stanford is continuing to improve their framework.Obviously, there are a huge set of Big Data projects, so your best option is to start with a base distribution that incorporates and tests the various versions of the projects and ensures they work together with security and management smoothly. I recommending using Hortonworks Connected Data Platforms as your base. There's a few that more projects that I would add if we were doing top 20, notably Storm, SOLR, Apache Oozie, and Apache HAWQ.   There's a lot of great technology underneath that, for the most part, you don't see or know like Apache Tez (though you need to configure that when running Hive), Apache Calcite, Apache Slider, Apache Zookeeper, and Livy.  These projects are essential for running a Big Data infrastructure.Interesting frameworks and tools to evaluate:HBase/Phoenix ORMSnappyDataConcordSpark SuccinctAlluxioApache ArrowTensorFlowDruidGeodeIgnite

Thoughts on Coupling in Software Design

2016-08-02 11:08:31

Coupling is a software metric that describes how closely connected two routines or modules are. It is a measure of quality. The concept was introduced by Larry Constantine in the 1960s and was formulated in a 1974 article for the IBM Systems Journal, Structured Design, and in the 1979 book by the same name.Having modules A and B, the more knowledge about B is required in order to understand A, the more closely connected is A to B. The fact that one module needs to be inspected in order to understand the operation of another is an indication of a degree of interconnection, even if the degree of interconnection is not known.Coupling is a measure of the strength of that interconnection. Coupling is affected by the type of connections between modules, interface complexity, information flow between module connections, and binding time of module connections. Coupling is usually contrasted with cohesion, e.g., low coupling translates into high cohesion and vice-versa.Levels of CouplingCoupling can be low / loose / weak or high / tight / strong.Tight coupling translates into ripple effects when making changes, as well as code that is difficult to understand. It tends to propagate errors across modules, when one module behaves incorrectly. It tends to complicate debugging and fixing defects.In loosely coupled systems, on the other hand, individual modules can be studied and altered without the need of taking into account a lot of information from other modules. Errors can be pointed out much more easily. Debugging takes less time, while fixing defects is usually simpler. The chances of error propagation across modules tend to be reduced.The levels of coupling below are ordered from high to low:Content Coupling: Content coupling, or pathological coupling, occurs when one module modifies or relies on the internal workings of another module. Changing the inner working will lead to the need of changing the dependent module. An example would be a search method that adds an object which is not found to the internal structure of the data structure used to hold information.Common Coupling: Global coupling, or common coupling, occurs when two or more functions share global data. Any changes to them have a ripple effect. An example of global coupling would be global information status regarding an operation, with the multiple modules reading and writing to that location.Control Coupling: Control coupling occurs when one module controls the flow of another by passing control information, e.g., a control flag, a comparison function passed to a sort algorithm.Stamp Coupling: Stamp coupling, or data structure coupling, occurs when modules share a composite data structure and use only a part of it, possibly different parts. One example is of a print module that accepts an Entity, and retrieves its information to construct a message.Data Coupling: Data coupling occurs when methods share data, regularly through parameters. Data coupling is better than stamp coupling, because the module takes exactly what it needs, without the need of it knowing the structure of a particular data structure.Message Coupling: Message coupling is the lowest form of coupling, realized with decentralization and message passing. Examples include Dependency Injection and Observables.Coupling MetricsClass LevelClass level coupling results from implementation dependencies in a system. In general, the more assumptions are made by one class about another, the tighter the coupling.The strength of coupling is given by the stability of a class, i.e., the amount of changes in dependant classes that need be made if a class changes, and the scope of access, i.e., the scope in which a class is accessed, with the higher scope introducing tighter coupling. At class level, the degree of coupling is measured as the ratio of number of messages passed to the number of messages received, i.e.,DC = MRC / MPCwhere MRC is the received message coupling (the number of messages received by a class from other classes), and MPC is the passed message coupling (the number of messages sent by a class to other classes).Class level is a particular case of the Module level metric.Module LevelA more general metric, this metric tracks other modules, global data, and outside environment. The formula computes a module indicator mc, wheremc = k / MWith k a proportionality constant and M a value calculated by the following formula:M = di + (a * ci) + d0 + (b * c0) + gd + c * gc) + w + rIn the formula above:a, b, and c are defined empiricallyw – the number of modules called (fan out) – and r – the number of modules calling the module under consideration (fan-in) are environmental coupling parametersgd and gc, describing the number of global variables used as data and as control, are global coupling parametersdi, do, ci, and co, describing the number of data and control input and output parameters, are data and control flow parametersOne important note to be made is that as the value of mc increases, the overall coupling decreases. In order to have the coupling move upward as the degree of coupling increases, a revised coupling metric, C, might be defined as:C = 1 - mcDecouplingIntroducing coupling increases the instability of a system. Decoupling is the systematic coupling reduction between modules with the explicit intent of making them more independent, i.e., minimizing the value of C, as defined in the previous section.Content coupling can be eliminated by following encapsulation.Common coupling can be resolved by introducing abstractions. Design patterns could prove useful towards achieving a good architecture.External coupling can be resolved by eliminating the knowledge of formats from the domain, and operating on concepts.Control coupling can be eliminated by using strategies or states.Stamp coupling can be eliminated by passing actual data.Data coupling can be eliminated by employing message passing.One very important principle to guide by in reducing coupling is the Law of Demeter, presented below.Law of DemeterAlso referred to as the principle of least knowledge, the Law of Demeter is a specific case of loose coupling. The principle states that a unit should only have knowledge of and talk to closely-related units, assuming as little as possible about the structures and properties of anything it interacts with, including its own subcomponents. For example, an object A could call functionality on object B, but should not reach through B to access an object C for its functionality. Instead, object B should facilitate access through its own interface, propagating the request to its subcomponents. Alternatively, A could have a direct reference to C.A more formal definition states that a method M on an object O can invoke the methods of the following objects:OM’s parametersAny objects created / instantiated within MO’s direct subcomponentsA global variable, accessible by O, in the scope of MIn particular, an object should not call a method on a returned object, i.e., there should be at most one dot in code, e.g., a.Method(), and not a.B.Method().ConclusionsCoupling is unavoidable; otherwise each module would be its own program. However, achieving low coupling should be one of the primary objectives in system design, such that individual modules can be studied and altered without the need of taking into account a lot of information from other modules, errors can be pointed out much more easily, and debugging takes less time, while fixing defects is usually simpler.Loose coupling leads to high cohesion, and together they lead to maintainable systems.ReferencesCoupling (computer programming)Coupling and CohesionReducing couplingSoftware Engineering: A Practitioner's ApproachLaw of Demeter

Solar-powered plane circles globe, returns to UAE

2016-07-26 08:23:42

ABU DHABI A solar-powered aircraft successfully completed the first fuel-free flight around the world on Tuesday, returning to Abu Dhabi after an epic 16-month voyage and demonstrating the potential of renewable energy.The plane, Solar Impulse 2, touched down in the United Arab Emirates capital at 0005 GMT (0405 local time) on Tuesday. It first took off from Abu Dhabi on March 9, 2015, beginning a landmark journey of about 40,000 km (24,500 miles) around the globe and nearly 500 hours of flying. Unfavorable weather at times hindered smooth flying, causing the plane to be grounded for months in some countries. Swiss explorers Bertrand Piccard and André Borschberg, Solar Impulse founders and pilots, took turns piloting the aircraft with a wingspan larger than a Boeing 747 and weighing only as much as a family car. The Swiss team is campaigning to bolster support for clean energy. The propeller-driven aircraft's four engines are powered exclusively by energy collected from more than 17,000 solar cells built the plane's wings. Excess energy is stored in four batteries during daylight hours to keep the plane flying after dark.Over its entire mission, Solar Impulse 2 cruised at altitudes of up to 9,000 meters and at an average speed of between 45 and 90 km (12.5 and 25 miles) per hour. The plane had 16 stopovers along the way including in Oman, India, Myanmar, China, Japan, the United States, Spain and Egypt. Abu Dhabi’s green energy firm Masdar is the official host partner of Solar Impulse 2. Oil-rich Abu Dhabi is investing billions in industry, tourism and renewables to diversify its economy away from oil. (Reporting by Stanley Carvalho, editing by Sami Aboudi and Hugh Lawson)

Yahoo reports lackluster results as sale looms

2016-07-19 06:48:57

Yahoo Inc's (YHOO.O) quarterly earnings fell short of Wall Street expectations on Monday in what may be the company's last financial report before it sells its core business.Yahoo reported adjusted earnings of 9 cents per share, short of the 10 cents that analysts expected. It also announced a $482 million write-down on the value of Tumblr, the social media service that it acquired in 2013 for $1.1 billion.Yahoo is in the process of auctioning off its search and advertising business, and is expected to choose a winner this week. The company said its board has made "great progress on strategic alternatives" but did not comment further on the auction process.Verizon Communications Inc and AT&T Inc are said to be in the running to acquire the core business, along with private equity firm TPG Capital and a consortium led by Quicken Loans founder Dan Gilbert and backed by billionaire Warren Buffett.Yahoo also owns large stakes in Chinese ecommerce giant Alibaba and Yahoo Japan, which are worth far more than the company's internet business.Monday's earnings report showed the continued slide in Yahoo's business during the protracted sale process. After the Tumblr write-down, the company posted a net loss of $439.9 million, or 46 cents per share, compared with a loss of $21.6 million, or 2 cents per share, a year earlier. Although total revenue rose to $1.31 billion from $1.24 billion a year earlier, the seeming improvement was the result of a change in the way the cost of acquiring traffic is counted. After deducting fees paid to partner websites for traffic, revenue fell to $841.2 million from $1.04 billion.Estimating that Tumblr is worth "nothing" at this point, Ross Gerber, cofounder and CEO of Gerber Kawasaki Wealth and Investment Management, said potential buyers were likely bidding lower than Yahoo believes it is worth."I can't imagine why the sale process is taking so long, the only thing I can think of is it's being overpriced. This report doesn't further create an impression that paying up for these assets has any value," Gerber said. Revenue in the company's emerging businesses, which Chief Executive Officer Marissa Mayer calls Mavens - mobile, video, native and social advertising - showed some life, rising 25.7 percent to $504 million in the second quarter ended June 30. But the improvement in Mavens was offset by decreases in gross search revenue that is only expected to get worse, said B. Riley & Co analyst Sameet Sinha. Gross search revenue for the quarter was $765 million, down 17 percent from the same period last year. "This is supposed to be the growth engine of the company, and at best it was up slightly year over year. That shows that even in high-growth categories like mobile and native they're losing their search impact," he said. JMP Securities analyst Ronald Josey said search revenues are a significant portion of Yahoo's overall revenues and their continued decline could definitely be a factor in the sale negotiations. "If search continues to decline as much as it has, that's something that's going to be called into question," he said.In a conference call, Yahoo Chief Financial Officer Ken Goldman touted the company's cost-cutting efforts.“Through excellent expenditure management of cost and capital, we achieved above the high-end of our guidance on adjusted EBITDA and significantly increased cash flow,” he said, referring to earnings before interest, taxes, depreciation and amortization.Yahoo's shares were little changed at $37.92 in trading after the bell. (Reporting by Supantha Mukherjee in Bengaluru; Editing by Chris Reese and Jonathan Oatis)

Split and Clone Editor Views in Eclipse

2016-07-12 04:24:10

Sometimes it is all about knowing the simple tricks in Eclipse which make life easier. Like this one: How to have a split editor view so I can edit multiple different sections of a source file.That feature is present in Eclipse Luna and afterwards, but because there is no icon in the view itself as in Microsoft Word, I have found that many do not know about this useful feature. The screensthots below are for Eclipse Luna.Split Editor ViewTo split an editor view, I have it selected (to be active), then I use the menu ‘Toggle Split Editor’:I can split it horizontal:Or in a vertical way:I can use the mouse to resize the split area:To remove the split, simply use the menu or shortcut again:Clone Editor ViewThe other useful function is to clone an Editor view:This creates a clone of that view:To ‘undo’ the cloning, I close the new editor view.SummarySplitting and Cloning gives me a way to edit the same source file in different portions of that file. The commands to Clone and Split is under the Window > Editor menu.Happy Cloning and Splitting!

Older Post
Apple Pay to go live in China on Feb 18
SXSW, authorities probe threats against canceled panels amid backlash
Short-legged Oregon arachnid gets 'behemoth' name