Growing pains for Wikipedia | CNET News.com

There's been quite a bit of discussion over the recent problems with articles on Wikipedia, from Dave Winer to Dan Gillmor and News.com (see linked quotes below). Much of this discussion focuses on the authority and accountability of Wikipedia and similar open source content. Update: More folk are picking up on this, including Business Week.

"First, in a Nov. 29 op-ed piece in USA Today, a former administrative assistant to Robert Kennedy lambasted the free online reference work for an article that suggested he may have been involved in the assassinations of both Robert F. Kennedy and John F. Kennedy.

"Then, on Dec. 1, a new flurry of attention came when former MTV VJ and podcasting pioneer Adam Curry was accused of anonymously editing out references to other people's seminal podcasting work in an article about the hot new digital medium.

"To critics of Wikipedia--which, in a spin on the open-source model, lets anyone create and edit entries--the news was further proof that the service has no accountability and no place in the world of serious information gathering."end quotation
-- Growing pains for Wikipedia at CNET News.com

Dave Winer asks

... the bigger problem is that Wikipedia is so often considered authoritative. That must stop now, surely. Every fact in there must be considered partisan, written by someone with a confict of interest. Further, we need to determine what authority means in the age of Internet scholarship. And we need to take a step back and ask if we really want the participants in history to write and rewrite the history. Isn't there a place in this century for historians, non-participants who observe and report on the events?end quotation
-- Dave Winer

And Dan Gillmor answers

Yes. They're also called journalists. We need them more than ever.end quotation
-- "Observing, Reporting" by Dan Gillmor

Though recent events in main stream media has shown that the professional journalists are not always any more authoritative than the citizen journalists, or citizen encyclopedists.

The problem is with the readership. Though more than 35 years old now, I still have the text from my high school freshman year logic course. Here are the common types of logical fallaices.

Common Logical Fallacies
a - Confusion of sequence and causality (post hoc, ergo propter hoc) Here we suppose that what precedes an effect is its cause (Often called the "fallacy of false causes").
b - Argument from Modernity This supposes that what is new is true (correct) and that what is fashionable is right or correct.
c - Argument from Antiquity This supposes that what is old and has been accepted for years is true or best, necessarily.
d - Argument from Authority Here it is supposed that what an authority on the matter says is true, necessarily.
e - Argument from the Mean This supposes that the individual cases from which the mean was taken - or from which the aveerage was obtained - most conform exactly to that average.
f - Argument from History This supposes that what has been proved by documentation to be true or good of or for one country, city, person, etc, must necessarily therefore be true or good for all or many or any others.end quotation
-- Logic Notes (A Brief, Philosophical Introduction to Logic) typed, not published, no copyright

So, whether you're reading Wikipedia or Encycolpedia Britannica, a blog or the N.Y. Times, read critically, think properly, think accurately, think logically.

Ingres and Enterprise Acceptance

Ingres is one of the early relational databases. For those who have been working on relational databases for some time, Ingres has a reputation for being a mature database. When Ingres became an Open Source database, it has made the choice for Open Source databases interesting.

Last November, Computer Associates International, Inc. and a private equity firm, Garnett & Helfrich Capital announced a partnership to divest CA’s Ingres open source database unit into an independent corporate entity, Ingres Corporation.

Ingres Corporation also announced its Management Team including the addition of former Veritas Software executives.

Will Ingres be a key force in enabling the enterprise acceptance of Open Source Databases?

Edwin Aoki Tag Spam Fighter

Edwin Aoki of AOL presented at Tag Tuesday. AOL is beginning its fight against Tag Spam or Spag before the problem becomes huge, and Edwin is asking other tag service providers [TSPs?, TaaS?] to join in. Edwin presented a number of interesting ideas, and he was very focused on his topic. I found that Edwin was really addressing the larger topic of tag usability. Many of the ideas and statistics that Edwin presented, not only reflect the idea of Spag, but also the idea of bringing tagging to the mainstream.

If the overuse of fairly generic tags like "Open Source" or "Health and Wellness" has already led to tag pollution, how useful will tags be to the casual user? If such tags are already "polluted" and unusable, will tags become so specific that only a handful of folks will find what they want through tagging?

For example, we met with the folks from Kinetic Networks yesterday, to discuss their open source ETL tool, KETL. I'm still writing the article, but the tags I'm using are "Computers and Internet", "Open Source", "Business Intelligence", Bizgres, ETL, KETL and "Kinetic Networks". Are at least two of those tags too polluted to be useful? Will someone looking for information on an open source ETL tool be able to find articles that are tagged as I tagged this one? Or will these tags only prove useful to someone specifically looking for information about Kinetic Networks and KETL.

Since the power of tagging is the human factor, the developing folksonomies, how will mainstream users, private folksonomies in intranets, and automated tagging suggestions affect this power?

I think that tags must be viewed through unions and intersections of meaning and context if they are ever to be useful to mainstream users in either a business or personal environment. The potential is there, but we need to understand tagging at a far more sophisticated level than current approaches allow.

Comparison of Six BI Products

I found an interesting article last night from Network Computing. The article, Business Intelligence Suites Gold Standard BI compares 6 Business Intelligence products - Actuate, Applix, Cognos, Information Builders, Microsoft and MicroStrategy. Using Network Computing's corporate data stored in Oracle9i, the products were evaluated based on their ability to leverage corporate infrastructure, including using Active Directory 2000 as an authentication source and Exchange 2000 for e-mail distribution of reports.

For those involved with Business Intelligence and Data Warehousing, it is an article worth reading.

KETL First Meeting

Clarise and I met with Marshall Stevenson, Juan Carlos Rojas and Nicholas E. C. Wakefield of Kinetic Networks, the company behind KETL. The open source KETL Framework is the ETL tool providing data gathering and data quality for the Bizgres Clickstream initiative.

Marshall, Clarise and I are all ex-Oracle, and worked together at a boutique consultancy eight years ago. In addition, Marshall has been specializing in BI solutions since then, and has even provided us with ETL specialists as supplemental staff for our own projects. Marshall recently joined Kinetic Networks as their VP of Sales, and set up this meeting.

With KPMG and IBM in his background, Juan founded Kinetic Networks in 1999, and is currently serving as the CEO. Kinetic Networks provided a data warehousing platform in a Software as a Service environment, as well as doing Business Intelligence System Integration; delivering BI solutions with a heavy emphasis on operational excellence.

After graduating from Oxford University, Nick worked at EDS and MicroStrategy, before joining Kinetic Networks in 2000. Based upon Nick's work, and their success with KETL the company split its focus between System Integration and Technology development earlier this year.

A custom DW prototype for a client led to the first iterations of KETL. Under Nick's guidance it soon expanded by adding XML capabilities, and a plugin architecture using their own Java classes. From their SI group's best practices for Data Quality, they added the capability for data QA on the fly. It wasn't long before they found a customer who wanted the tool, more than System Integration, and a product strategy came about.

KETL has been used in production environments for over three years. Some of their experiences include

  • customers who have used KETL, because the cost savings on the "backend" ETL tool allowed for a better front end Reporting, OLAP and Portal solution
  • running KETL alongside commercial ETL, with a custom plugin allowing KETL to perform data quality for an existing install
  • in its open source form KETL is capable of handling 100’s of millions of record
  • use of KETL in production environments has driven its development

"Performance is tuned in areas that are important to our clients to date. The ability to read multiple files across multiple devices in parallel, using multiple CPU’s is something KETL can do relatively easily, our clients wouldn not have used KETL extensively otherwise.

"Historically KETL has been used for commercial implementations along with services; packaging has not been the focus. Over time this will improve." end quotation
-- Nick Wakefield, CTO Kinetic Networks

KETL was made from the beginning to handle very large data sets. This, coupled with its ability to perform Statistical Analysis for on-the-fly data quality, makes KETL a perfect tool for Clickstream analysis. Based upon this capability, and working with Greenplum at one customer led to the open sourcing of the KETL Framework, and its inclusion in the Bizgres stack.

Any company wishing to open source their product must decide how best to join the open source community. Choosing a license was very difficult. The KETL Framework is licensed under the LGPL. This allowed them to build commercial components upon it. The MPP and Data Quality modules are licensed commercially through an upgrade path.

Another choice to be made involves building a community around KETL. Currently, KETL relies on the Bizgres community. However, KETL works against any target database, and through its plugin architecture, can work with any source system. Thus, Kinetic Networks is looking towards building its own community.

Kinetic Networks has grown organically since its inception, without requiring any external funding. As such it relies on its System Integration group, including remote management of customer business intelligence solutions, for cash flow, as it grows its open source and product strategy.

As we proceed with our Open Source Business Intelligence book project, we're planning to stay in touch with Marshall, Nick and Juan, doing more interviews with them, and even a podcast or two. Bernard will be applying his Open Source Maturity Model, and we're looking forward to playing with KETL in the lab. We'll keep you posted.

While Kinetic Networks is finalizing their open source strategy, there are no links directly from the Kinetic Networks website to information about KETL. However, Nick has graciously provided the following links for the convenience of our readers. [As per the comment below, the following documentation is no longer available. Please check the documentation section of KETL.org.]

  • KETL for Data Integration [PDF]
  • KETL XML Steps Overview [PDF]
  • KETL Training [PDF]
  • allation Guide [PDF]
  • KETL System Operations Guide [PDF]
  • Documentation related to Bizgres and the stack
  • Software, currently its part of Bizgres Clickstream only

December 2019
Mon Tue Wed Thu Fri Sat Sun
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          
 << <   > >>
The TeleInterActive Press is a collection of blogs by Clarise Z. Doval Santos and Joseph A. di Paolantonio, covering the Internet of Things, Data Management and Analytics, and other topics for business and pleasure. 37.540686772871 -122.516149406889

Search

Categories

The TeleInterActive Lifestyle

Yackity Blog Blog

The Cynosural Blog

Open Source Solutions

DataArchon

The TeleInterActive Press

  XML Feeds

Mindmaps

Our current thinking on sensor analytics ecosystems (SAE) bringing together critical solution spaces best addressed by Internet of Things (IoT) and advances in Data Management and Analytics (DMA) is updated frequently. The following links to a static, scaleable vector graphic of the mindmap.

Recent Posts

powered by b2evolution