Quantcast
Channel: The most updated car database
Viewing all articles
Browse latest Browse all 126

Data scrapping services

$
0
0

For many years, manual data entry in Excel (sourcing from books, as seen in this video) or manual copy-pasting from websites, was my only way of creating databases. Time-costly jobs which limited the size of the databases I could make. Even in this way I made about 40 databases about cars, geography, real estate, gaming, etc, from pure hobby.

Starting from 2015 I offer web scrapping services… in ANY field, not limited to automobiles. Scrapping usually means running a software to visit a list of given pages and copy data from each page and put it in a database automatically. If you need something very different than my current databases, I can create new databases as long you provide a source of data, a website where to extract data from. Do not expect me to get data that you cannot find yourself in any form, for example:

– Do not think that I can compile a table with dimensions of car lights, bumpers, windows, etc. Such dimensions are not provided in car manuals. If you sell such car parts, measuring yourself your own parts is the only solution.

– Do not think that I can compile a table with number of cars sold in your country, breakdown by model, if your government is not tracking sales and making them public on the internet. Data needs to be available somewhere in order to scrap it and put into a database.

– Theoretically I can scrap data from any website, but only websites having the particular data you are interested in a consistent structure from page to page, can produce a good usable database. After automatic scrapping, less or more manual work is needed to make database usable.

Easy data scrapping: if each item have a separate URL and there is an index page with links to each item (instead of drop-down boxes). There are few tools available online, some free download, some paid subscription, but they only extract data from pages, do not discover pages. In this case I can help you by making a list of URLs then run the extractor. Price: dozens to hundred dollars depending by amount of pages that need data extraction.

Difficult data scrapping: if the website have drop-down lists, search boxes, javascript codes, and require user to do some actions to get the page with data you want to scrap. In this case we need a custom data scrapper made in PHP or Visual Basic. Price: few hundred dollars depending by how much coding is necessary.

Impossible data scrapping: car classifieds websites usually hide seller phone number and contact email, which can be revealed by clicking a button, this is done specially to prevent scrapping and protect emails from being spamming. The only solution is to have a human visiting each page and copy-pasting this hidden data, which require large amount of time. If you are an insurance company willing to do SMS or email marketing and you intend to hire me to make a database of car owners sourcing data from classifieds website, most likely I can’t help you due of time available.

Data scrapping is legal or not?

Depends… if the data is added by volunteers, or by sellers in classifieds websites, scrapping is probably legal. But if authors of website hardworked to compile data from sources like car brochures, scrapping is probably illegal, especially if you use their data in making your own website or other commercial purpose. Most websites contains dummy data (example: a bunch of cars having +/- 1 horsepower than official value) and if you use data copied from them, they can prove from where you stole data and make a lawsuit against you. BEWARE!

One day, someone offered me to sell a car database that he claimed to have been creating it by working for 4 months, 8 hours per day, copy-pasting data from a website. From copyright point of view does not matter if you scrapped automatically or written manually, your work is not original. He was probably not aware of scrapping software. If you wasted few months doing something that could have been done in few hours, you’re an idiot (I was an idiot too doing such jobs before 2015 being not aware of scrapping software, but small jobs only) and I am still doing in case of European database because I source data from books (offline sources), making an original product on the web.

For a moment I became concerned if my European database is a copyright violation, but I came in conclusion that it is fine, as long as mine is an original product with different data structure than the book, and it target online audience, while the AutoKatalog is a book sold in shops targeting car hobbyists. I am doing each year over 100 sales without having a single person worrying about copyright.

Example of data scrapping projects done

All scrapping software save data in CSV format, but when it is about publishing on website, I save it as XLS and add borders, colors, headers and other visual enhancements to match the style of other products “Made by Teoalida”.

India Car Database – source: www.carwale.com – Made from personal interest due of numerous people asking me about indian car database. Being my first scrapping project, took initially about 7 days to figure out how to do it, and later figured out that can do it in 2 days.

India Bike Database – source: www.bikewale.com – Made after 2nd person requested a database of bikes sold in India. One of easiest projects, having no drop-down boxes but plain links to each bike page. 250 records, price: 25 euro.

CarWale On-Road Prices – source: www.carwale.com – Made for a customer, a difficult project taking about 20 hours of work in Visual Basic to make an application sending javascript requests to CarWale website to get price of each car in each city, application works at a rate of 2 requests per second, so 3100 cars × 510 cities = 1632000 seconds = 226 hours needed to get all on-road prices, RTO tax and insurance. Price: $300 USD of which $200 goes to programmer and $100 my fee for keeping scrapping application running daily for a month.

Skyscrapers Database – source: www.emporis.com – Made from personal interest and turned into a marketing failure, having no sales in 6 months. Took about 20 hours to compile manually list of cities with buildings over 100 meters, then list of buildings from these cities, then used a software to automatically extract data about each building. 15000+ buildings. Emporis block my IP for 2 days if I access more than 3000 pages in one day, so data extraction was limited to 3000 buildings per day, which took about 1 hour daily for 6 days.

Singapore Condo Database – source: www.singaporeexpats.com – Made for a customer, took 3 hours and sold database with 2809 condos for $140.50 SGD.

Singapore Condo Database II – source: www.propertyguru.com.sg – Made for a customer. Apparently an easy project, having plain links to all condos, it turned difficult due of a CAPTCHA page appearing every 10 pages extracted. My programmer partner spend 5 days in Visual Basic and charged me $300 USD, and at final sold database with 3176 condos for $317.60 SGD (about 240 USD), leaving me in loss, unless I sell same database to a second customer.

Sulekha.xls – source: www.sulekha.com – A bit unusual data scrapping, an one-time use database for SMS and email marketing, instead of creating a saleable product containing all car models, all buildings, all of something.


Viewing all articles
Browse latest Browse all 126

Trending Articles