Misinformation detection

Exploring various methods to Detect fake news algorithms

by Ved Shah

We live in a world where we are constantly exposed to new information in the form of new articles from a very wide catalog of news providers. This has further caused information fatigue and we are highly likely to forward such links without verifying the source or contents of the article especially because an ad based revenue model on news websites often use attention seeking titles and sensationalism to generate income, often at a detriment to society.

In recent years, this has caused incidents of lynching in India due to easy propagation of  fake information over various social media outlets. This topic has been actively debated in both the houses of the parliament and while regulatory reforms are necessary, doing so while preserving any reasonable ideals of free speech will undoubtedly require massive investment in technological architectures to accurately analyze every article or piece of information.

In this project, I exclusively focused on web articles posted on news sites targeting the average citizen. This enables an analysis of a myriad of attributes including but not limited to type of domain, Length of url, website description, spelling checks, keyword count and frequency, use of particular phrases, number of redirection etc. Additional details are provided in the python notebook at relevant places alongside the code.

I have attempted training and validating the data set with 5 different methods. Some were just permutation while others like glove embedding were new full fledged implementations to the model. the results for the same are summarized in the charts below
While looking at the data accuracy and precision on the different ML models, some discrepancies were observed. I reached a point of negative returns with word to vec as compared to bag of word despite its more robust mechanism. This might be due to the limited training data however without access to an alternate controlled environment, this claim remains unverified

At its best, I reached a validation data accuracy of 89.8% and an f score of  0.94, a result that in my limited experience was reproducible in the real world -17/20 samples returned accurate results even with tricky websites like Open AI and the Onion.

The notebook is available for download here

 Please leave comments on your experience with it and if you notice any strange biases during your testing.

Comments

Popular Posts