facebook google twitter tumblr instagram linkedin

Search This Blog

Powered by Blogger.
  • Home
  • About Me!!!
  • Privacy Policy

It's A "Holly Jolly" Artificial Intelligence Enabled Special Christmas

Did you know there’s a bit of artificial intelligence (AI) magic behind the scenes helping to make your holiday dreams come true? Santa’s ...

  • Home
  • Travel
  • Life Style
    • Category
    • Category
    • Category
  • About
  • Contact
  • Download

Shout Future

Educational blog about Data Science, Business Analytics and Artificial Intelligence.

TAPPING A NEURAL NETWORK TO TRANSLATE TEXT IN CHUNKS

Facebook.com, facebook AI,  artificial intelligence, language translation
Facebook.com artificial intelligence 

Facebook's research arm is coming up with better ways to translate text using AI.
Facebook
Facebook’s billion-plus users speak a plethora of languages, and right now, the social network supports translation of over 45 different tongues. That means that if you’re an English speaker confronted with German, or a French speaker seeing Spanish, you’ll see a link that says “See Translation.”
But Tuesday, Facebook announced that its machine learning experts have created a neural network that translates language up to nine times faster and more accurately than other current systems that use a standard method to translate text.
The scientists who developed the new system work at the social network’s FAIR group, which stands for Facebook A.I. Research.
“Neural networks are modeled after the human brain,” says Michael Auli, of FAIR, and a researcher behind the new system. One of the problems that a neural network can help solve is translating a sentence from one language to another, like French into English. This network could also be used to do tasks like summarize text, according to a blog item posted on Facebook about the research.
But there are multiple types of neural networks. The standard approach so far has been to use recurrent neural networks to translate text, which look at one word at a time and then predict what the output word in the new language should be. It learns the sentence as it reads it. But the Facebook researchers tapped a different technique, called a convolutional neural network, or CNN, which looks at words in groups instead of one at a time.
“It doesn’t go left to right,” Auli says, of their translator. “[It can] look at the data all at the same time.” For example, a convolutional neural network translator can look at the first five words of a sentence, while at the same time considering the second through sixth words, meaning the system works in parallel with itself.
Graham Neubig, an assistant professor at Carnegie Mellon University’s Language Technologies Institute, researches natural language processing and machine translation. He says that this isn’t the first time this kind of neural network has been used to translate text, but that this seems to be the best he’s ever seen it executed with a convolutional neural network.
“What this Facebook paper has basically showed— it’s revisiting convolutional neural networks, but this time they’ve actually made it really work very well,” he says.
Facebook isn’t yet saying how it plans to integrate the new technology with its consumer-facing product yet; that’s more the purview of a department there call the applied machine learning group. But in the meantime, they’ve released the tech publicly as open-source, so other coders can benefit from it
That’s a point that pleases Neubig. “If it’s fast and accurate,” he says, “it’ll be a great additional contribution to the field.”
May 09, 2017 No comments
How to learn Machine learning? Learning machine learning is not a big deal but if you want to be an expert in any field you need someone as a mentor. So try to follow some professional machine learning blogs like Shout Future, Kdnuggets, Analytics Vidhya etc., and if you have any doubts, clarify it through forums or directly ask in comments.

Learning machine learning, how to learn machine learning, teach yourself machine learning, machine learning
Machine Learning 

I realised the growth and development of machine learning in future will be incredible, so started to learn Machine learning back in 2013. I started from scratch and I confused a lot. Because I don't know what to do and where to start.
I think you are also like me! Here I mentioned step by step learning process to become a professional machine learning engineer yourself.

1.Getting Started:

  • Find out what is machine learning.
  • Skills to become machine learning engineer.
  • Attend conferences and workshops.
  • Interact with experienced people directly or through social media.

2. Learn Basics of Mathematics and statistics:

  • Start to learn Descriptive and Inferential statistics by Udacity course.
  • Linear algebra course by Khan Academy and MITopenCourseware
  • Learn Multivariate Calculus by Calculus One
  • Learn Probability by edX course.

3. Choose your tool: Learn R or Python:

Learn R:
  • R is very easy to learn compare than python. 
  • Interactive intro to R programming language by Data Camp. 
  • Exploratory data analysis by Coursera. 
  • Start to follow R-Bloggers. 
Learn Python:
  • Start your programming with Google's Python Class.
  • Intro to data analysis by Udacity.

4. Basic and Advanced machine learning tools:

  • Machine Learning Course by Coursera.
  • Machine Learning classification by Coursera.
  • Intro to Machine Learning by Udacity.
  • Blogs and Guides like Shout Future,  machine Learning mastery,  etc.
  • Algorithms: Design and Analysis 1 
  • Algorithms: Design and Analysis 2

5. Build your Profile:

  • Start your Github profile. 
  • Start to practice in Kaggle competitions.
That's it. With these skills you can enter into the sexiest job in the world now called "Data Scientist". Plan well and follow this steps very well. 
You have to travel very long to become an expert in this field. So start your journey from today onwards and separate yourself from the crowd. 
Please Comment your ideas and opinions. 


May 09, 2017 13 comments
Insurance fraud, insurance, data analytics, big data, insurance fraud cases

While there is no doubt that the insurance segment is witnessing an unprecedented annual growth, insurers continue to struggle with loss-leading portfolios and lower insurance penetration among consumers. Insurers are facing increasing pressure to strike the right balance, while ensuring adherence to underwriting and claims decisions in the face of regulatory pressures, growth of digital channels and increasing competition. Adding to this is the need to secure the good risks, while weeding out the bad risks. 
Insurers are turning their attention towards big data and analytics solutions to help check fraud, recognize misrepresentation and prevent identity theft. With the government’s recent push to adopt digitization, the Aadhaar card plays a crucial role, linking income tax permanent account numbers (PANs), banks, credit bureaus, telecoms and utilities and providing a unified and centralized data registry that profiles an individual’s economic behaviour. The e-commerce boom provides additional data on financial behaviour. 

 Fraudulent practices 

Claims fraud is a threat to the viability of the health insurance business. Although health insurers regularly crack down on unscrupulous healthcare providers, fraudsters continually exploit any new loopholes with forged documents purporting to be from leading hospitals. 
 Medical ID theft is one of the most common techniques adopted by fraudsters. Due to this, claim funds are paid into their bank accounts, through identity theft. The insurer’s procedures allows for the policyholder to send a scanned image of his/her cheque, with the bank account details for ID purposes, which is then manipulated by the fraudsters. 
Besides forged documents, other common sources of fraud come from healthcare providers themselves, with cases of ‘upgrading’ (billing for more expensive treatments than those provided), ‘phantom billing’ and ‘ganging’ (billing for services provided to family members or other individuals accompanying the patient, but not delivered). 
 Health insurers have to take action before an insurance claim is paid and to put an end to the ‘pay-and-chase’ approach. Using data to validate a pre-payment would be far more useful than having to ‘chase’ for a payment. This approach, however, rests on real-time access to information sources. 

 Life insurance’s woes 

India’s life insurers suffer from low persistency rates that see more than one in three policies lapse by the end of the second year. This may be attributed to mis-selling, misrepresentation of material facts, premeditated fabrication and in other cases suppression of facts. 
Life insurers have been facing fraud that is largely data driven and can be curbed with effective use of data analytics. While seeking customer information, insurers should perform checks against public record databases to ensure they have insights into the validity of personal information. This can be achieved through data mining and validation from various sources. For instance, in the US, frauds are committed through stolen social security numbers or driver’s license numbers, or those of deceased individuals. Data accessed from various sources will help identify if the person in question is using multiple identities or multiple people are using the identity presented. 
 The use of public, private and proprietary databases to obtain information not typically found in an individual’s wallet to create knowledge-based authentication questions which are designed to be answered only by the correct individual can also help reduce fraud significantly. 
 Continuous evaluation of existing customers is also critical for early fraud detection. For example, one red flag for potential fraud can involve beneficiary or address changes for new customers. Insurers should verify address changes, as many consumers do not know their identity has been stolen until after it has happened. By applying relationship analytics, insurers can obtain insights into the relationship between the insured, the owner, and the beneficiary, to help determine whether those individuals are linked to other suspicious entities or are displaying suspicious behaviour patterns. 

 Solutions for all 

Like in most developed insurance markets, it is imperative that data on policies, claims and customers be made available on a shared platform, in real-time. Such a platform can allow for real-time enquiries on customers. It can also facilitate screening of the originator of every proposal. Insurers would contribute policy, claims and distributors’ information to the repository on a regular basis. Such data repositories can provide insights to help insurers detect patterns, identify nexus and track mis-selling. 
 Insurance data is dynamic and hence data analytics cannot depend only on past behaviour patterns. So data has to be updated regularly. Predictive analysis can play a significant role in identifying distributor nexus, mis-selling and repeated misrepresentations. Relationship analytics could be used to identify linked sellers and suspected churn among them. 
 These data platform-based solutions are not just about preventing reputational risk and loss of business, but with controlled and more informed risk selection, there could be a positive impact on pricing of products. The whole process of underwriting new business with greater granularity of risk and greater transparency can bring in new customers, but it could also out-price some others. There can be increased scrutiny of agents, brokers and distributors to eliminate any suspects from the system. 
 Successful fraud prevention strategies include shifting towards a proactive approach that detects fraud prior to policy issuance, and leveraging red flags or business rules, real-time identity checks, relationship analytics, and predictive models. Insurers who leverage both internal data and external data analytics will better understand fraud risks throughout their customer life cycles, and will be more prepared to detect and mitigate those risks.
May 09, 2017 2 comments
Table of Contents:

Introduction
Nature of Data
              1. Time series data.
              2. Spatial data
              3. Spacio-temporal data.
Categories of data
           1.Primary data 
            1. Direct personal interviews.
            2. Indirect Oral interviews.
            3. Information from correspondents.
            4. Mailed questionnaire method.
            5. Schedules sent through enumerators.
2. Secondary data
1. Published sources
                2. Unpublished sources.


Data gathering techniques, data collection, data collection and analysis,  gathering data,  data gathering,
Data gathering techniques 

Introduction:
Everybody collects, interprets and uses information, much of it in numerical or statistical forms in day-to-day life. It is a common practice that people receive large quantities of information everyday through conversations, televisions, computers, the radios, newspapers, posters, notices and instructions. It is just because there is so much information available that people need to be able to absorb, select and reject it.

 In everyday life, in business and industry, certain statistical information is necessary and it is independent to know where to find it how to collect it. As consequences, everybody has to compare prices and quality before making any decision about what goods to buy. As employees of any firm, people want to compare their salaries and working conditions, promotion opportunities and so on. In time the firms on their part want to control costs and expand their profits.

One of the main functions of statistics is to provide information which will help on making decisions. Statistics provides the type of information by providing a description of the present, a profile of the past and an estimate of the future.

The following are some of the objectives of collecting statistical information.
1. To describe the methods of collecting primary statistical information.
2. To consider the status involved in carrying out a survey.
3. To analyse the process involved in observation and interpreting.
4. To define and describe sampling.
5. To analyse the basis of sampling.
6. To describe a variety of sampling methods.

Statistical investigation is a comprehensive and requires systematic collection of data about some group of people or objects, describing and organizing the data, analyzing the data with the help of different statistical method, summarizing the analysis and using these results for making judgements, decisions and predictions.
The validity and accuracy of final judgement is most crucial and depends heavily on how well the data was collected in the first place. The quality of data will greatly affect the conditions and hence at most importance must be given to this process and every possible precaution should be taken to ensure accuracy while collecting the data.

Nature of data:
It may be noted that different types of data can be collected for different purposes. The data can be collected in connection with time or geographical location or in connection with time and location.
The following are the three types of data:
1. Time series data.
2. Spatial data
3. Spacio-temporal data.

Time series data Analysis:
It is a collection of a set of numerical values, collected over a period of time. The data might have been collected either at regular intervals of time or irregular intervals of time.
Spatial Data:
If the data collected is connected with that of a place, then it is termed as spatial data. For example, the data may be
1. Number of runs scored by a batsman in different test matches in a test series at different places.
2. District wise rainfall in a state.
3. Prices of silver in four metropolitan cities.
Spacio Temporal Data:
If the data collected is connected to the time as well as place then it is known as spacio temporal data.

Categories of data:
Any statistical data can be classified under two categories depending upon the sources utilized. These categories are,
1. Primary data
2. Secondary data

Primary data:
Primary data is the one, which is collected by the investigator himself for the purpose of a specific inquiry or study. Such data is original in character and is generated by survey conducted by individuals or research institution or any organisation.
For example, if a researcher is interested to know the impact of noon meal scheme for the school children, he has to undertake a survey and collect data on the opinion of parents and children by asking relevant questions. Such a data collected for the purpose is called primary data.

The primary data can be collected by the following five methods.
1. Direct personal interviews.
2. Indirect Oral interviews.
3. Information from correspondents.
4. Mailed questionnaire method.
5. Schedules sent through enumerators.

1. Direct personal interviews:
The persons from whom information’s are collected are known as informants. The investigator personally meets them and asks questions to gather the necessary information’s. It is the suitable method for intensive rather than extensive field surveys. It suits best for intensive study of the limited field.

Merits:
1. People willingly supply informations because they are approached personally. Hence, more response noticed in this method than in any other method.
2. The collected informations are likely to be uniform and accurate. The investigator is there to clear the doubts of the informants.
3. Supplementary informations on informant’s personal aspects can be noted. Informations on character and environment may help later to interpret some of the results.
4. Answers for questions about which the informant is likely to be sensitive can be gathered by this method.
5. The wordings in one or more questions can be altered to suit any informant. Explanations may be given in other languages also. Inconvenience and misinterpretations are thereby avoided.

Limitations:
1. It is very costly and time consuming.
2. It is very difficult, when the number of persons to be interviewed is large and the persons are spread over a wide area.
3. Personal prejudice and bias are greater under this method.

2. Indirect Oral Interviews:
Under this method the investigator contacts witnesses or neighbours or friends or some other third parties who are capable of supplying the necessary information. This method is preferred if the required information is on addiction or cause of fire or theft or murder etc., If a fire has broken out a certain place, the persons living in neighbourhood and witnesses are likely to give information on the cause of fire.
In some cases, police interrogated third parties who are supposed to have knowledge of a theft or a murder and get some clues. Enquiry committees appointed by governments generally adopt this method and get people’s views and all possible details of facts relating to the enquiry. This method is suitable whenever direct sources do not exist or cannot be relied upon or would be unwilling to part with the information.
The validity of the results depends upon a few factors, such as the nature of the person whose evidence is being recorded, the ability of the interviewer to draw out information from the third parties by means of appropriate questions and cross examinations, and the number of persons interviewed. For the success of this method one person or one group alone should not be relied upon.

3. Information from correspondents:
The investigator appoints local agents or correspondents in different places and compiles the information sent by them. Informations to Newspapers and some departments of Government come by this method. The advantage of this method is that it is cheap and appropriate for extensive investigations. But it may not ensure accurate results because the correspondents are likely to be negligent, prejudiced and biased. This method is adopted in those cases where informations are to be collected periodically from a wide area for a long time.

4. Mailed questionnaire method:
Under this method a list of questions is prepared and is sent to all the informants by post. The list of questions is technically called questionnaire. A covering letter accompanying the questionnaire explains the purpose of the investigation and the importance of correct informations and requests the informants to fill in the blank spaces provided and to return the form within a specified time. This method is appropriate in those cases where the informants are literates and are spread over a wide area.

Merits:
1. It is relatively cheap.
2. It is preferable when the informants are spread over the wide area.

Limitations:
1. The greatest limitation is that the informants should be literates who are able to understand and reply the questions.
2. It is possible that some of the persons who receive the questionnaires do not return them.
3. It is difficult to verify the correctness of the informations furnished by the respondents.
With the view of minimizing non-respondents and collecting correct information, the questionnaire should be carefully drafted. There is no hard and fast rule. But the following general principles may be helpful in framing the questionnaire. A covering letter and a self addressed and stamped envelope should accompany the questionnaire.
The covering letter should politely point out the purpose of the survey and privilege of the respondent who is one among the few associated with the investigation. It should assure that the informations would be kept confidential and would never be misused. It may promise a copy of the findings or free gifts or concessions etc.,

Characteristics of a good questionnaire:
1. Number of questions should be minimum.
2. Questions should be in logical orders, moving from easy to more difficult questions.
3. Questions should be short and simple. Technical terms and vague expressions capable of different interpretations should be avoided.
4. Questions fetching YES or NO answers are preferable. There may be some multiple choice questions requiring lengthy answers are to be avoided.
5. Personal questions and questions which require memory power and calculations should also be avoided.
6. Question should enable cross check. Deliberate or unconscious mistakes can be detected to an extent.
7. Questions should be carefully framed so as to cover the entire scope of the survey.
8. The wording of the questions should be proper without hurting the feelings or arousing resentment.
9. As far as possible confidential informations should not be sought.
10. Physical appearance should be attractive, sufficient space should be provided for answering each question.

5. Schedules sent through Enumerators:
Under this method enumerators or interviewers take the schedules, meet the informants and filling their replies. Often distinction is made between the schedule and a questionnaire. A schedule is filled by the interviewers in a face-to-face situation with the informant. A questionnaire is filled by the informant which he receives and returns by post. It is suitable for extensive surveys.

Merits:
1. It can be adopted even if the informants are illiterates.
2. Answers for questions of personal and pecuniary nature can be collected.
3. Non-response is minimum as enumerators go personally and contact the informants.
4. The informations collected are reliable. The enumerators can be properly trained for the same.
5. It is most popular methods.

Limitations:
1. It is the costliest method.
2. 2. Extensive training is to be given to the enumerators for collecting correct and uniform informations.
3. Interviewing requires experience. Unskilled investigators are likely to fail in their work.

Before the actual survey, a pilot survey is conducted. The questionnaire/Schedule is pre-tested in a pilot survey. A few among the people from whom actual information is needed are asked to reply. If they misunderstand a question or find it difficult to answer or do not like its wordings etc., it is to be altered. Further it is to be ensured that every questions fetches the desired answer.

Merits and Demerits of primary data:
1. The collection of data by the method of personal survey is possible only if the area covered by the investigator is small. Collection of data by sending the enumerator is bound to be expensive. Care should be taken twice that the enumerator record correct information provided by the informants.
2. Collection of primary data by framing a schedules or distributing and collecting questionnaires by post is less expensive and can be completed in shorter time.
3. Suppose the questions are embarrassing or of complicated nature or the questions probe into personnel affairs of individuals, then the schedules may not be filled with accurate and correct information and hence this method is unsuitable.
4. The information collected for primary data is mere reliable than those collected from the secondary data.

Secondary Data:
Secondary data are those data which have been already collected and analysed by some earlier agency for its own use; and later the same data are used by a different agency.

According to W.A.Neiswanger, ‘A primary source is a publication in which the data are published by the same authority which gathered and analysed them. A secondary source is a publication, reporting the data which have been gathered by other authorities and for which others are responsible’.

Sources of Secondary data:
In most of the studies the investigator finds it impracticable to collect first-hand information on all related issues and as such he makes use of the data collected by others. There is a vast amount of published information from which statistical studies may be made and fresh statistics are constantly in a state of production. The sources of secondary data can broadly be classified under two heads:

1. Published sources, and
2. Unpublished sources.

1. Published Sources:
The various sources of published data are:
1. Reports and official publications of
(i) International bodies such as the International Monetary Fund, International Finance Corporation and United Nations Organisation.
(ii) Central and State Governments such as the Report of the Tandon Committee and Pay Commission.
2. Semi-official publication of various local bodies such as Municipal Corporations and District Boards.
3. Private publications-such as the publications of –
(i) Trade and professional bodies such as the Federation of Indian Chambers of Commerce and Institute of Chartered Accountants.
(ii) Financial and economic journals such as ‘Commerce’ , ‘Capital’ and ‘ Indian Finance’ .
(iii) Annual reports of joint stock companies.
(iv) Publications brought out by research agencies, research scholars, etc.

It should be noted that the publications mentioned above vary with regard to the periodically of publication. Some are published at regular intervals (yearly, monthly, weekly etc.,) whereas others are ad hoc publications, i.e., with no regularity about periodicity of publications.

Note: A lot of secondary data is available in the internet. We can access it at any time for the further studies.

2. Unpublished Sources
All statistical material is not always published. There are various sources of unpublished data such as records maintained by various Government and private offices, studies made by research institutions, scholars, etc. Such sources can also be used where necessary

Precautions in the use of Secondary data
The following are some of the points that are to be considered in the use of secondary data
1. How the data has been collected and processed
2. The accuracy of the data
3. How far the data has been summarized
4. How comparable the data is with other tabulations
5. How to interpret the data, especially when figures collected for one purpose is used for another
Generally speaking, with secondary data, people have to compromise between what they want and what they are able to find.

Merits and Demerits of Secondary Data:
1. Secondary data is cheap to obtain. Many government publications are relatively cheap and libraries stock quantities of secondary data produced by the government, by companies and other organisations.
2. Large quantities of secondary data can be got through internet.
3. Much of the secondary data available has been collected for many years and therefore it can be used to plot trends.
4. Secondary data is of value to: - The government – help in making decisions and planning future policy.
- Business and industry – in areas such as marketing, and sales in order to appreciate the general economic and social conditions and to provide information on competitors.
- Research organisations – by providing social, economical and industrial information.

May 06, 2017 No comments
Google AI, AutoDraw, Auto Draw,  Machine learning, neural network
Google Neural Networks Drawings
Google went big on art this week. The company launched a platform to help people who are terrible at art communicate visually. It also published research about teaching art to another terrible stick-figure drawer: a neural network.
On Tuesday, the company announced AutoDraw, a web-based service aimed at users who lack drawing talent. Essentially, the program allows you to use your finger (or mouse if you’re on a computer) to sketch out basic images like apples and zebras. Then, it analyzes your pathetical drawing and suggests a professionally-drawn version of the same thing. You then click on the nice drawing you wanted, and it replaces yours with the better one. It’s like autocorrect, but for drawing.
Nooka Jones, the team lead at Google’s creative lab, says that AutoDraw is about helping people express themselves. “A lot of people are fairly bad at drawing, but it shouldn’t limit them from being able to communicate visually,” he says. “What if we could help people sketch out their ideas, or bring their ideas to life, through visual communication, with the idea of machine learning?”
The system’s underlying tech has its roots in a surprising place, according to Dan Motzenbecker, a creative technologist at Google. “It’s a neural network that’s actually originally devised to recognize handwriting,” he says. That handwriting could be Latin script, or Chinese or Japanese characters, like kanji. From there, “it’s not that big of a leap to go to a doodle.”
As people makes their line drawings, the network tries to figure out what it is. “The same way that might work for a letter of the alphabet, or a Chinese character,” Motzenbecker says, “we can use that for a doodle of a toaster.”
Neural networks get better by learning from data, but when asked about how and if the system is learning from our drawings, Jones says: “In theory, yes; we don’t quite disclose what we actually use as input back into the algorithm.”
Just like there are different ways to draw a letter, there are multiple representations of an elephant or a horse. “The more variety it sees,” Motzenbecker says, “the more adaptable it is to seeing novel ways of sketching things.” Users are also confirming the AI’s guesses when selecting a new drawing, which could help to guide its future decisions.
“One of the things that you see across the entire industry, and Google has recognized the potential of this much earlier than most other technology companies,” says Shuman Ghosemajumder, the chief technology officer at Shape Security in Mountain View, Calif., and a former Google employee, “is the use of machine learning to be able to do things that were previously thought to require direct human intervention.” And machine learning models need data.
“In this case, if you’ve got an app that millions of people potentially will use to be able to attempt to draw different figures,” he adds, “even if your technology isn’t perfect right now, you are creating this amazing training set of input data that can be used to improve these models over time.”
While AutoDraw is about helping people turn their doodles into more recognizable images, the search giant is also interested in how computers draw. On Thursday, Google Research published a blog post and paper about how they had schooled a recurrent neural network to draw items like cats and pigs.
The researcher team's goal was to train “a machine to draw and generalize abstract concepts in a manner similar to humans,” according to a blog item written by David Ha, a Google Brain Resident. The system works by taking human input—say, a drawing of a cat or just the word “cat,” according to a Google spokesperson—and then making its own drawing.
The results are fascinating and bizarre. In one example, the researchers presented the system with a sketch of a three-eyed cat. The computer drew its own cat, but this one had the right number of eyes, “suggesting that our model has learned that cats usually only have two eyes.”
In another, when presented with a picture of a toothbrush, the Google neural network’s cat model made a Picasso-like feline that still had a toothbrush-inspired feel to it.
A Google spokesperson confirmed that it is different neural networks that are powering AutoDraw and the other research, but the similarities are striking: in both cases, the system is drawing on machine learning to take a piece of input and then either suggest a professionally-drawn image, or create something new totally on its own.
April 15, 2017 No comments
In this industry, it's a tired old cliche to say that we're building the future. But that's true now more than at any time since the Industrial Revolution. The proliferation of personal computers, laptops, and cell phones has changed our lives, but by replacing or augmenting systems that were already in place. Email supplanted the post office; online shopping replaced the local department store; digital cameras and photo sharing sites such as Flickr pushed out film and bulky, hard-to-share photo albums. AI presents the possibility of changes that are fundamentally more radical: changes in how we work, how we interact with each other, how we police and govern ourselves.

Fear of a mythical "evil AI" derived from reading too much sci-fi won't help. But we do need to ensure that AI works for us rather than against us; we need to think ethically about the systems that we're building. Microsoft's CEO, Satya Nadella, writes:
The debate should be about the values instilled in the people and institutions creating this technology. In his book Machines of Loving Grace, John Markoff writes, 'The best way to answer the hard questions about control in a world full of smart machines is by understanding the values of those who are actually building these systems.' It's an intriguing question, and one that our industry must discuss and answer together.
What are our values? And what do we want our values to be? Nadella is deeply right in focusing on discussion. Ethics is about having an intelligent discussion, not about answers, as such—it's about having the tools to think carefully about real-world actions and their effects, not about prescribing what to do in any situation. Discussion leads to values that inform decision-making and action.
The word "ethics" comes from "ethos," which means character: what kind of a person you are. "Morals" comes from "mores," which basically means customs and traditions. If you want rules that tell you what to do in any situation, that's what customs are for. If you want to be the kind of person who executes good judgment in difficult situations, that's ethics. Doing what someone tells you is easy. Exercising good judgement in difficult situations is a much tougher standard.
Exercising good judgement is hard, in part, because we like to believe that a right answer has no bad consequences; but that's not the kind of world we have. We've damaged our sensibilities with medical pamphlets that talk about effects and side effects. There are no side effects; there are just effects, some of which you might not want. All actions have effects. The only question is whether the negative effects outweigh the positive ones. That's a question that doesn't have the same answer every time, and doesn't have to have the same answer for every person. And doing nothing because thinking about the effects makes us uncomfortable is, in fact, doing something.
The effects of most important decisions aren't reversible. You can't undo them. The myth of Pandora's box is right: once the box is opened, you can't put the stuff that comes out back inside. But the myth is right in another way: opening the box is inevitable. It will always be opened; if not by you, by someone else. Therefore, a simple "we shouldn't do this" argument is always dangerous, because someone will inevitably do it, for any possible "this." You may personally decide not to work on a project, but any ethics that assumes people will stay away from forbidden knowledge is a failure. It's far more important to think about what happens after the box has been opened. If we're afraid to do so, we will be the victims of whoever eventually opens the box.
Finally, ethics is about exercising judgement in real-world situations, not contrived situations and hypotheticals. Hypothetical situations are of very limited use, if not actually harmful. Decisions in the real world are always more complex and nuanced. I'm completely uninterested in whether a self-driving car should run over the grandmothers or the babies. An autonomous vehicle that can choose which pedestrian to kill surely has enough control to avoid the accident altogether. The real issue isn't who to kill, where either option forces you into unacceptable positions about the value of human lives, but how to prevent accidents in the first place. Above all, ethics must be realistic, and in our real world, bad things happen.
That's my rather abstract framework for an ethics of AI. I don't want to tell data scientists and AI developers what to do in any given situation. I want to give scientists and engineers tools for thinking about problems. We surely can't predict all the problems and ethical issues in advance; we need to be the kind of people who can have effective discussions about these issues as we anticipate and discover them.

Talking through some issues

What are some of the ethical questions that AI developers and researchers should be thinking about? Even though we're still in the earliest days of AI, we're already seeing important issues rise to the surface: issues about the kinds of people we want to be, and the kind of future we want to build. So, let's look at some situations that made the news.

Pedestrians and passengers

The self-driving car/grandmother versus babies thing is deeply foolish, but there's a variation of it that's very real. Should a self-driving car that's in an accident situation protect its passengers or the people outside the car? That's a question that is already being discussed in corporate board rooms, as it was at Mercedes recently, which decided that the company's duty was to protect the passengers rather than pedestrians. I suspect that Mercedes' decision was driven primarily by accounting and marketing: who will buy a car that will sacrifice the owner to avoid killing a pedestrian? But Mercedes made an argument that's at least ethically plausible: they have more control over what happens to the person inside the car, so better to save the passenger than to roll the dice on the pedestrians. One could also argue that Mercedes has an ethical committent to the passengers, who have put their lives in the hands of their AI systems.
The bigger issue is to design autonomous vehicles that can handle dangerous situations without accidents. That's the real ethical choice. How do you trade off cost, convenience, and safety? It's possible to make cars that are more safe or less safe; AI doesn't change that at all. It's impossible to make a car (or anything else) that's completely safe, at any price. So, the ethics here ultimately come down to a tradeoff between cost and safety, to ourselves and to others. How do we value others? Not grandmothers or babies (who will inevitably be victims, just as they are now, though hopefully in smaller numbers), but passengers and pedestrians, Mercedes' customers and non-customers? The answers to these questions aren't fixed, but they do say something important about who we are.

Crime and punishment

COMPAS is commercial software used in many state courts to recommend prison sentences, bail terms, and parole. In 2016, ProPublica published an excellent article showing that COMPAS consistently scores blacks as greater risks for re-offending than whites who committed similar or more serious crimes.
Although COMPAS has been secretive about the specifics of their software, ProPublica published the data on which their reports were based. Abe Gong, a data scientist, followed up with a multi-part study, using ProPublica's data, showing that the COMPAS results were not "biased." Abe is very specific: he means "biased" in a technical, statistical sense. Statistical bias is a statement about the relationship between the outputs (the risk scores) and the inputs (the data). It has little to do with whether we, as humans, think the outputs are fair.
Abe is by no means an apologist for COMPAS or its developers. As he says, "Powerful algorithms can be harmful and unfair, even when they're unbiased in a strictly technical sense." The results certainly had disproportionate effects that most of us would be uncomfortable with. In other words, they were "biased" in the non-technical sense. "Unfair" is a better word that doesn't bring in the trapping of statistics.
The output of a program reflects the data that goes into it. "Garbage in, garbage out" is a useful truism, especially for systems that build models based on terabytes of training data. Where does that data come from, and does it embody its own biases and prejudices? A program's analysis of the data may be unbiased, but if the data reflects arrests, and if police are more likely to arrest black suspects, while letting whites off with a warning, a statistically unbiased program will necessarily produce unfair results. The program also took into account factors that may be predictive, but that we might consider unfair: is it fair to set a higher bail because the suspect's parents separated soon after birth, or because the suspect didn't have access to higher education?
There's not a lot that we can do about bias in the data: arrest records are what they are, and we can't go back and un-arrest minority citizens. But there are other issues at stake here. As I've said before, I'm much more concerned about what happens behind closed doors than what happens in the open. Cathy O'Neil has frequently argued that secret algorithms and secret data models are the real danger. That's really what COMPAS shows. It is almost impossible to discuss whether a system is unfair if we don't know what the system is and how it works. We don't just need open data; we need to open up the models that are built from the data.
COMPAS demonstrates, first, that we need a discussion about fairness, and what that means. How do we account for the history that has shaped our statistics, a history that was universally unfair to minorities? How do we address bias when our data itself is biased? But we can't answer these questions if we don't also have a discussion about secrecy and openness. Openness isn't just nice; it's an ethical imperative. Only when we understand what the algorithms and the data are doing, can we take the next steps and build systems that are fair, not just statistically unbiased.

Child labor

One of the most penetrating remarks about the history of the internet is that it was "built on child labor." The IPv4 protocol suite, together with the first implementations of that suite, was developed in the 1980s, and was never intended for use as a public, worldwide, commercial network. It was released well before we understood what a 21st century public network would need. The developers couldn't forsee more than a few tens of thousands of computers on the internet; they didn't anticipate that it would be used for commerce, with stringent requirements for security and privacy; putting a system on the internet was difficult, requiring handcrafted static configuration files. Everything was immature; it was "child labor," technological babies doing adult work.
Now that we're in the first stages of deploying AI systems, the stakes are even higher. Technological readiness is an important ethical issue. But like any real ethical issue, it cuts both ways. If the public internet had waited until it was "mature," it probably would never have happened; if it had happened, it would have been an awful bureacratic mess, like the abandoned ISO-OSI protocols, and arguably no less problematic. Unleashing technological children on the world is irresponsible, but preventing those children from growing up is equally irresponsible.
To move that argument to the 21st century: my sense is that Uber is pushing the envelope too hard on autonomous vehicles. And we're likely to pay for that—in vehicles that perhaps aren't as safe as they should be, or that have serious security vulnerabilities. (In contrast, Google is being very careful, and that care may be why they've lost some key people to Uber.) But if you go to the other extreme and wait until autonomous vehicles are "safe" in every respect, you're likely to end up with nothing: the technology will never be deployed. Even if it is deployed, you will inevitably discover risk factors that you didn't forsee, and couldn't have forseen without real experience.
I'm not making an argument about whether autonomous vehicles, or any other AI, are ready to be deployed. I'm willing to discuss that, and if necessary, to disagree. What's more important is to realize that this discussion needs to happen. Readiness itself is an ethical issue, and one that we need to take seriously. Ethics isn't simply a matter of saying that any risk is acceptable, or (on the other hand) that no risk is acceptable. Readiness is an ethical issue precisely because it isn't obvious what the "right" answer is, or whether there is any "right" answer. Is it an "ethical gray area"? Yes, but that's precisely what ethics is about: discussing the gray areas.

The state of surveillance

In a chilling article, The Verge reports that police in Baltimore used a face identification application called Geofeedia, together with photographs shared on Instagram, Facebook, and Twitter, to identify and arrest protesters. The Verge's report is based on a more detailed analysis by the ACLU. Instagram and the other companies quickly terminated Geofeedia's account after the news went public, though they willingly provided the data before it was exposed by the press.
Applications of AI to criminal cases quickly get creepy. We should all be nervous about the consequences of building a surveillance state. People post pictures to Instagram without thinking of the consequences, even when they're at demonstrations. And, while it's easy to say "anything you post should be assumed to be public, so don't post anything that you wouldn't anyone to see," it's difficult, if not impossible, to think about all the contexts in which your posts can be put.
The ACLU suggests putting the burden on the social media companies: social media companies should have "clear, public, and transparent policies to prohibit developers from exploiting user data for surveillance." Unfortunately, this misses the point: just as you can't predict how your posts will be used or interpreted, who knows the applications to which software will be put? If we only have to worry about software that's designed for surveillance, our task is easy. It's more likely, though, that applications designed for innocent purposes, like finding friends in crowds, will become parts of surveillance suites.
The problem isn't so much the use or abuse of individual Facebook and Instagram posts, but the scale that's enabled by AI. People have always seen other people in crowds, and identified them. Law enforcement agencies have always done the same. What AI enables is identification at scale: matching thousands of photos from social media against photos from drivers' license databases, passport databases, and other sources, then taking the results and crossing them with other kinds of records. Suddenly, someone who participates in a demonstration can find themselves facing a summons over an old parking ticket. Data is powerful, and becomes much more powerful when you combine multiple data sources.
We don't want people to be afraid of attending public gatherings, or in terror that someone might take a photo of them. (A prize goes to anyone who can find me on the cover of Time. These things happen.) But it's also unreasonable to expect law enforcement to stick to methodologies from the 80s and earlier: crime has certainly moved on. So, we need to ask some hard questions—and "should law enforcement look at Instagram" is not one of them. How does automated face recognition at scale change the way we relate to each other, and are those changes acceptable to us? Where's the point at which AI becomes harassment? How will law enforcement agencies be held accountable for the use, and abuse, of AI technologies? Those are the ethical questions we need to discuss.

Our AIs are ourselves

Whether it's fear of losing jobs or fear of a superintelligence deciding that humans are no longer necessary, it's always been easy to conjure up fears of artificial intelligence.
But marching to the future in fear isn't going to end well. And unless someone makes some fantastic discoveries about the physics of time, we have no choice but to march into the future. For better or for worse, we will get the AI that we deserve. The bottom line of AI is simple: to build better AI, be better people.
That sounds trite, and it is trite. But it's also true. If we are unwilling to examine our prejudices, we will implement AI systems that are "unfair" even if they're statistically unbiased, merely because we won't have the interest to examine the data on which the system is trained. If we are willing to live under an authoritarian government, we will build AI systems that subject us to constant surveillance: not just through Instagrams of demonstrations, but in every interaction we take part in. If we're slaves to a fantasy of wealth, we won't object to entrepreneurs releasing AI systems before they're ready, nor will we object to autonomous vehicles that preferentially protect the lives of those wealthy enough to afford them.
But if we insist on open, reasoned discussion of the tradeoffs implicit in any technology; if we insist that both AI algorithms and models are open and public; and if we don't deploy technology that is grossly immature, but also don't suppress new technology because we fear it, we'll be able to have a healthy and fruitful relationship with the AIs we develop. We may not get what we want, but we'll be able to live with what we get.
Walt Kelly said it best, back in 1971: "we have met the enemy and he is us." In a nutshell, that's the future of AI. It may be the enemy, but only if we make it so. I have no doubt that AI will be abused and that "evil AI" (whatever that may mean) will exist. As Tim O'Reilly has argued, large parts of our economy are already managed by unintelligent systems that aren't under our control in any meaningful way. But evil AI won't be built by people who think seriously about their actions and the consequences of their actions. We don't need to forsee everything that might happen in the future, and we won't have a future if we refuse to take risks. We don't even need complete agreement on issues such as fairness, surveillance, openness, and safety. We do need to talk about these issues, and to listen to each other carefully and respectfully. If we think seriously about ethical issues and build these discussions into the process of developing AI, we'll come out OK.
To create better AI, we must be better people.
February 18, 2017 No comments
After knowing the relationship between two variables we may be interested in estimating (predicting) the value of one variable given the value of another. The variable predicted on the basis of other variables is called the “dependent” or the ‘explained’ variable and the other the ‘independent’ or the ‘predicting’ variable. The prediction is based on average relationship derived statistically by regression analysis. The equation, linear or otherwise, is called the regression equation or the explaining equation.

For example, if we know that advertising and sales are correlated we may find out expected amount of sales for a given advertising expenditure or the required amount of expenditure for attaining a given amount of sales.

The relationship between two variables can be considered between, say, rainfall and agricultural production, price of an input and the overall cost of product, consumer expenditure and disposable income. Thus, regression analysis reveals average relationship between two variables and this makes possible estimation or prediction.

 Definition: 

Regression is the measure of the average relationship between two or more variables in terms of the original units of the data.

Types Of Regression:

The regression analysis can be classified into:
 a) Simple and Multiple
b) Linear and Non –Linear
c) Total and Partial

a) Simple and Multiple: 

In case of simple relationship only two variables are considered, for example, the influence of advertising expenditure on sales turnover. In the case of multiple relationship, more than two variables are involved.

On this while one variable is a dependent variable the remaining variables are independent ones. For example, the turnover (y) may depend on advertising expenditure (x) and the income of the people (z). Then the functional relationship can be expressed as  y = f (x,z).

b) Linear and Non-linear: 

The linear relationships are based on straight-line trend, the equation of which has no-power higher than one. But, remember a linear relationship can be both simple and multiple. Normally a linear relationship is taken into account because besides its simplicity, it has a better predective value, a linear trend can be easily projected into the future. In the case of non-linear relationship curved trend lines are derived. The equations of these are parabolic.

c) Total and Partial: 

In the case of total relationships all the important variables are considered. Normally, they take the form of a multiple relationships because most economic and business phenomena are affected by multiplicity of cases. In the case of partial relationship one or more variables are considered, but not all, thus excluding the influence of those not found relevant for a given purpose.

Linear Regression Equation: 

If two variables have linear relationship then as the independent variable (X) changes, the dependent variable (Y) also changes. If the different values of X and Y are plotted, then the two straight lines of best fit can be made to pass through the plotted points. These two lines are known as regression lines. Again, these regression lines are based on two equations known as regression equations. These equations show best estimate of one variable for the known value of the other. The equations are linear.
Linear regression equation of Y on X is
Y = a + bX ……. (1)
And X on Y is X = a + bY……. (2) a, b are constants.

From  (1) We can estimate Y for known value of X.        
(2) We can estimate X for known value of Y. 
February 18, 2017 No comments
Newer Posts
Older Posts

About me

About Me


Aenean sollicitudin, lorem quis bibendum auctor, nisi elit consequat ipsum, nec sagittis sem nibh id elit. Duis sed odio sit amet nibh vulputate.

Follow Us

Labels

AI News AI Technology Artificial Intelligence Course Big data analytics Data Science Google AI Robots Statistics

recent posts

Blog Archive

  • ▼  2017 (12)
    • ▼  December (1)
      • It's A "Holly Jolly" Artificial Intelligence Enabl...
    • ►  May (7)
    • ►  April (1)
    • ►  February (3)
  • ►  2016 (18)
    • ►  December (2)
    • ►  November (15)
    • ►  October (1)

Follow Us

  • Facebook
  • Google Plus
  • Twitter
  • Pinterest

Report Abuse

About Me

Koti
View my complete profile
FOLLOW ME @INSTAGRAM

Created with by ThemeXpose