Google Finance Scraping Spider in PHP, scrape millions of companies and exchange prices
All scraping projects:

Google and Bing Scraping Service (new)

Google Search scraper

Google Suggest scraper

Google Finance scraper

Google Finance Scraping Spider PHP Code

Project offered by compunect [scraping@compunect.com] last successful test run: April 2014 Version 1.2 released

The Google Finance Scraping Spider is a powerful scraping project, opensource and written in well structured PHP code. You may use this code as it is (see below) or customize it to power your startup or project. The finance database of Google is a rich environment, this project only scrapes a small part of it, see below. I am also working on more scraping projects related to Google finance. compunect is an IT services and development company originally founded in Germany and is now situated in Czech Republic focused at challenging tasks.

Google Finance Scraping Spider primary features:

  • Continued 24/7 operation without getting blocked or detected by Google. Troublefree operation.

  • This project can scrape all companies out of Google finance, get their titles, identifiers and if applicable stock exchange and price data.

  • Uses the latest available Google Finance internal API which is usually used to build google finance charts.

  • Can spider into Google Finance to get all companies out of the database, no previous knowledge required.

  • Automated IP rotation and management, delay management and emergency mechanisms

  • Full automated functions, human readable output and machine readable storage arrays

  • Multiple operation modes (scrape all companies,all companies and details, only selected companies and details)

  • Well structured and readable code, easy to modify and customize

  • Local storage cache to use IP addresses best possible and can continue where it was stopped/interrupted

  • Very easy to start and configure, does not require expert programming knowhow to be used

  • Open source PHP code available for commercial use, easily worth multiple thousand USD

  • Perfectly suitable as foreground or background process on a Linux server but also compatible with other operating systems


The Google Search Scraper was quite a big success and it is used to power projects of some of the worlds most known companies (Forbes 100) and from many more promising startups. I recently offered the Google Suggest Scraping Spider, a free PHP project which can scrape and spider the internal Google Suggest API revealing what the world is really searching for. This project can scrape all the companies out of Google Finance and in a second step it can find out about the stock exchange and the current exchange prices. A while ago it was possible to use a free API from Google to get this detailed data, at a highly restricted number of requests per day of course. This Google Finance Scraper is able to not only get the realtime information out of Google, it can do this for every single company available without getting blocked! Of course this is just a fraction of the information available, it is possible to extend this project to draw charts (scrape more than the last price) or get a lot of additional meta information.

Scraping Challenges

For almost any developer it is quite a challenging task to finish such a project. There are many possible issues arising from scraping and this is knowledge not taught in a usual University. All functions within Google finance are obfuscated, all the APIs are either obfuscated or at least undocumented. So it takes quite some research and time to find out "what the heck" this or that variable or output is doing. When it comes to scraping it is important to behave like a browser and user. Wrong behaviour will lead to a block of IPs either short or long term, threatening your company, server and project. This code solves all the typical challenges and provides an easily customizable basement for your own project, or it may be used as it is.

IP/Proxy management

When automatically accessing such a large service as Google Finance the result if usually a block/ban of the used IP addresses. The reason is usually a wrong delay between requests, wrong usage of IP addresses or bad code. This project solves all the challenges when it comes to proxy management, it uses the APIs of us-proxies.com to get new IP addresses on the fly. These measures have been taken to prevent detection during scraping: *) Local file based caching reduces the amount of requests and allows to stop/start the scraper without loss of information *) IP management routines remember the usage of each single IP address and reads/stores this data in a local file *) Delay routines put correct delays between requests and IP changes to avoid detection and obvious non organic request rates *) Request calls and HTTP headers fake a Chrome browser, the code simulates searching through a Firefox browser *) A powerful HTTP library (libcURL) is used to connect to Google Finance, this allows fine control of the behaviour

US-Proxy support

This project runs through a US Proxy service, powered through the supplied API it is possible to scrape millions of results without getting blocked. The benefit of using us-proxies.com is an easily extendable IP service providing the best IP quality in the industry at a fair price aimed toward professionals. However, the code is not limited to this particular service. You are free to adapt the source to suit your needs.

Example output from the scraper

Here are two example result-sets from a test-run:
Mode: 'Spidering and Stock details' Limited to 1000 results, took just a few minutes (using a productive size us-proxies plan).

Google Finance Exchange Spider results (unsorted) | Identifier | Exchange | Company name | Price | Price time | | ---------- | ---------- | ---------------------------------------------------------------------- | ---------- | --------------- | | .DJI | INDEXDJX | Dow Jones Industrial Average | 16539.22 | 13:50:00 GMT | | AAPL | NASDAQ | Apple Inc. | 541.792 | 13:46:00 GMT | | BAC | NYSE | Bank of America Corp | 17.215 | 13:46:00 GMT | | AMZN | NASDAQ | Amazon.com, Inc. | 346.52 | 13:46:00 GMT | | A | NYSE | Agilent Technologies Inc. | 56.55 | 13:46:00 GMT | | AMD | NYSE | Advanced Micro Devices, Inc. | 4.075 | 13:52:00 GMT | | T | NYSE | AT&T Inc. | 35.29 | 13:56:00 GMT | | AIG | NYSE | American International Group Inc | 50.34 | 13:46:00 GMT | | BRK.A | NYSE | Berkshire Hathaway Inc. | 187213 | 20:00:00 GMT | | FNMA | OTCBB | Federal National Mortgage Assctn Fnni Me | 3.98 | 13:40:00 GMT | | AA | NYSE | Alcoa Inc | 12.79 | 13:46:00 GMT | | ALU | NYSE | Alcatel Lucent SA (ADR) | 4.17 | 13:52:00 GMT | | ATVI | NASDAQ | Activision Blizzard, Inc. | 20.88 | 13:46:00 GMT | | NLY | NYSE | Annaly Capital Management, Inc. | 10.99 | 13:52:00 GMT | | AKS | NYSE | AK Steel Holding Corporation | 7.41 | 13:46:00 GMT | | BAC | NYSE | Bank of America Corp | 17.215 | 13:46:00 GMT | | IBM | NYSE | International Business Machines Corp. | 193.46 | 13:52:00 GMT | | BIDU | NASDAQ | Baidu Inc (ADR) | 162.66 | 13:46:00 GMT | | BBRY | NASDAQ | BlackBerry Ltd | 8.16 | 13:46:00 GMT | | FAZ | NYSEARCA | Direxion Daily Financial Bear 3X Shares | 19.27 | 13:56:00 GMT | | BRK.A | NYSE | Berkshire Hathaway Inc. | 187213 | 20:00:00 GMT | | FAS | NYSEARCA | Direxion Daily Financial Bull 3X Shares | 95.72 | 13:56:00 GMT | | BP | NYSE | BP plc (ADR) | 48.63 | 13:46:00 GMT | | BA | NYSE | The Boeing Company | 128.02 | 13:46:00 GMT | | BBY | NYSE | Best Buy Co., Inc. | 26.5 | 13:46:00 GMT | | B | NYSE | Barnes Group Inc. | 39.19 | 13:46:00 GMT | | ATVI | NASDAQ | Activision Blizzard, Inc. | 20.88 | 13:46:00 GMT | | ERX | NYSEARCA | Direxion Daily Energy Bull 3X Shs(ETF) | 92.92 | 13:48:00 GMT | | GBP | CURRENCY | British Pound Sterling | -1 | 00:00:00 GMT | | SENSEX | INDEXBOM | S&P BSE SENSEX | 22551.49 | 10:10:00 GMT | | C | NYSE | Citigroup Inc | 47.91 | 13:46:00 GMT | | .IXIC | INDEXNASDA | NASDAQ Composite | 4282.585 | 13:52:00 GMT | | CSCO | NASDAQ | Cisco Systems, Inc. | 23 | 13:46:00 GMT | | JPM | NYSE | JPMorgan Chase & Co. | 60.44 | 13:50:00 GMT | | CAT | NYSE | Caterpillar Inc. | 100.63 | 13:46:00 GMT | | CHK | NYSE | Chesapeake Energy Corporation | 26.36 | 13:48:00 GMT | | CTSH | NASDAQ | Cognizant Technology Solutions Corp | 52.91 | 13:46:00 GMT | | FCX | NYSE | Freeport-McMoRan Copper & Gold Inc. | 33.24 | 13:48:00 GMT | | KO | NYSE | The Coca-Cola Company | 38.45 | 13:50:00 GMT | | CRM | NYSE | salesforce.com, inc. | 58.66 | 13:46:00 GMT | | NLY | NYSE | Annaly Capital Management, Inc. | 10.99 | 13:52:00 GMT | | GLW | NYSE | Corning Incorporated | 21.13 | 13:48:00 GMT | | NIFTY | NSE | CNX NIFTY | 6752.55 | 10:02:00 GMT | | CMG | NYSE | Chipotle Mexican Grill, Inc. | 580.75 | 13:46:00 GMT | | COP | NYSE | ConocoPhillips | 70.28 | 13:46:00 GMT | | .DJI | INDEXDJX | Dow Jones Industrial Average | 16539.22 | 13:50:00 GMT | | AMD | NYSE | Advanced Micro Devices, Inc. | 4.075 | 13:52:00 GMT | | USD | CURRENCY | US Dollar | -1 | 00:00:00 GMT | | D | NYSE | Dominion Resources, Inc. | 70.1 | 13:46:00 GMT | | FAZ | NYSEARCA | Direxion Daily Financial Bear 3X Shares | 19.27 | 13:56:00 GMT | | FAS | NYSEARCA | Direxion Daily Financial Bull 3X Shares | 95.72 | 13:56:00 GMT | | DIS | NYSE | The Walt Disney Company | 81.67 | 13:58:00 GMT | | DDD | NYSE | 3D Systems Corporation | 58.29 | 13:48:00 GMT | | DELL | NASDAQ | Dell Inc. | -1 | 00:00:00 GMT | | DLLR | NASDAQ | DFC Global Corp | 9.425 | 13:48:00 GMT | | DRYS | NASDAQ | DryShips Inc. | 3.2899 | 13:48:00 GMT | | ERX | NYSEARCA | Direxion Daily Energy Bull 3X Shs(ETF) | 92.92 | 13:48:00 GMT | | HD | NYSE | The Home Depot, Inc. | 79.85 | 13:50:00 GMT | | TNA | NYSEARCA | Direxion Small Cap Bull 3X Shares (ETF) | 81.22 | 13:56:00 GMT | | DNDN | NASDAQ | Dendreon Corporation | 3.0199 | 13:48:00 GMT | | GE | NYSE | General Electric Company | 25.92 | 13:48:00 GMT | | EMC | NYSE | EMC Corporation | 27.94 | 13:48:00 GMT | | XOM | NYSE | Exxon Mobil Corporation | 97.43 | 13:52:00 GMT | | EBAY | NASDAQ | eBay Inc | 56.1 | 13:48:00 GMT | | EUR | CURRENCY | Euro | -1 | 00:00:00 GMT | | CHK | NYSE | Chesapeake Energy Corporation | 26.36 | 13:48:00 GMT | | GRBEQ | OTCMKTS | Grubb & Ellis Company | -1 | 00:00:00 GMT | | ETFC | NASDAQ | E TRADE Financial Corporation | 23.529 | 13:56:00 GMT | | ERX | NYSEARCA | Direxion Daily Energy Bull 3X Shs(ETF) | 92.92 | 13:48:00 GMT | | E | NYSE | Eni SpA (ADR) | 49.96 | 13:42:00 GMT | | EWZ | NYSEARCA | iShares MSCI Brazil Index (ETF) | 45.25 | 13:50:00 GMT | | EA | NASDAQ | Electronic Arts Inc. | 29.4225 | 13:48:00 GMT | | AXP | NYSE | American Express Company | 90.23 | 13:48:00 GMT | | EPD | NYSE | Enterprise Products Partners L.P. | 70.91 | 13:48:00 GMT | | VWO | NYSEARCA | Vanguard FTSE Emerging Markets ETF | 40.93 | 13:58:00 GMT | | F | NYSE | Ford Motor Company | 16.22 | 13:52:00 GMT | | FB | NASDAQ | Facebook Inc | 63.8901 | 13:48:00 GMT | | UKX | INDEXFTSE | FTSE 100 | 6665.56 | 13:56:00 GMT | | WFC | NYSE | Wells Fargo & Co | 49.7 | 13:58:00 GMT | | FAZ | NYSEARCA | Direxion Daily Financial Bear 3X Shares | 19.27 | 13:56:00 GMT | | FNMA | OTCBB | Federal National Mortgage Assctn Fnni Me | 3.98 | 13:40:00 GMT | | FAS | NYSEARCA | Direxion Daily Financial Bull 3X Shares | 95.72 | 13:56:00 GMT | | FMCC | OTCBB | Federal Home Loan Mortgage Corp | 3.94 | 13:40:00 GMT | | FSLR | NASDAQ | First Solar, Inc. | 71.66 | 13:48:00 GMT | | FCX | NYSE | Freeport-McMoRan Copper & Gold Inc. | 33.24 | 13:48:00 GMT | | FFIV | NASDAQ | F5 Networks, Inc. | 110.17 | 13:52:00 GMT | | ETFC | NASDAQ | E TRADE Financial Corporation | 23.529 | 13:56:00 GMT | | WBS | NYSE | Webster Financial Corporation | 31.6 | 13:48:00 GMT | | | UNKNOWN EX | iPath S&P 500 VIX Short Term Futures TM ETN | -1 | 00:00:00 GMT | | XLF | NYSEARCA | Select Sector Financial Slct Str SPDR Fd | 22.39 | 13:48:00 GMT | | GOOG | NASDAQ | Google Inc | 1138.3 | 13:48:00 GMT | | GE | NYSE | General Electric Company | 25.92 | 13:48:00 GMT | | GLD | NYSEARCA | SPDR Gold Trust (ETF) | 124.63 | 13:56:00 GMT | | GRPN | NASDAQ | Groupon Inc | 8.13 | 13:48:00 GMT | | GS | NYSE | Goldman Sachs Group Inc | 166.4 | 13:56:00 GMT | | GM | NYSE | General Motors Company | 34.595 | 13:52:00 GMT | | DLLR | NASDAQ | DFC Global Corp | 9.425 | 13:48:00 GMT | | FCX | NYSE | Freeport-McMoRan Copper & Gold Inc. | 33.24 | 13:48:00 GMT | | PHYS | NYSEARCA | Sprott Physical Gold Trust | 10.74 | 13:56:00 GMT | | UGL | NYSEARCA | ProShares Ultra Gold (ETF) | 47.09 | 13:54:00 GMT | | PG | NYSE | The Procter & Gamble Company | 80 | 13:54:00 GMT | | GRBEQ | OTCMKTS | Grubb & Ellis Company | -1 | 00:00:00 GMT | | GLW | NYSE | Corning Incorporated | 21.13 | 13:48:00 GMT | | CMG | NYSE | Chipotle Mexican Grill, Inc. | 580.75 | 13:46:00 GMT | | GBP | CURRENCY | British Pound Sterling | -1 | 00:00:00 GMT | | HPQ | NYSE | Hewlett-Packard Company | 33.38 | 13:54:00 GMT | | BRK.A | NYSE | Berkshire Hathaway Inc. | 187213 | 20:00:00 GMT | | FMCC | OTCBB | Federal Home Loan Mortgage Corp | 3.94 | 13:40:00 GMT | | HD | NYSE | The Home Depot, Inc. | 79.85 | 13:50:00 GMT | | HEB | NYSEMKT | Hemispherx BioPharma, Inc | 0.4 | 13:46:00 GMT | | HSI | INDEXHANGS | HANG SENG INDEX | 22523.94 | 08:02:00 GMT | | HAL | NYSE | Halliburton Company | 59.6 | 13:50:00 GMT | | HLF | NYSE | Herbalife Ltd. | 58.21 | 13:50:00 GMT | | RAX | NYSE | Rackspace Hosting, Inc. | 32.26 | 13:56:00 GMT | | YGE | NYSE | Yingli Green Energy Hold. Co. Ltd. (ADR) | 4.69 | 13:50:00 GMT | | RHT | NYSE | Red Hat Inc | 53.41 | 13:56:00 GMT | | 2498 | TPE | HTC Corp | 153.5 | 05:32:00 GMT | | VYM | NYSEARCA | Vanguard High Dividend Yield ETF | 63.32 | 13:46:00 GMT | | HIG | NYSE | Hartford Financial Services Group Inc | 35.66 | 13:50:00 GMT | | DVA | NYSE | DaVita HealthCare Partners Inc | 69.14 | 13:50:00 GMT | | .DJI | INDEXDJX | Dow Jones Industrial Average | 16539.22 | 13:50:00 GMT | | .INX | INDEXSP | S&P 500 | 1886.97 | 13:56:00 GMT | | .IXIC | INDEXNASDA | NASDAQ Composite | 4282.585 | 13:52:00 GMT | | INTC | NASDAQ | Intel Corporation | 25.78 | 13:50:00 GMT | | IBM | NYSE | International Business Machines Corp. | 193.46 | 13:52:00 GMT | | SLV | NYSEARCA | iShares Silver Trust (ETF) | 19.3 | 13:56:00 GMT | | NATI | NASDAQ | National Instruments Corp | 29.33 | 13:52:00 GMT | | ISRG | NASDAQ | Intuitive Surgical, Inc. | 501.48 | 13:56:00 GMT | | IWM | NYSEARCA | iShares Russell 2000 Index (ETF) | 117.95 | 13:54:00 GMT | | | UNKNOWN EX | iPath S&P 500 VIX Short Term Futures TM ETN | -1 | 00:00:00 GMT | | EWZ | NYSEARCA | iShares MSCI Brazil Index (ETF) | 45.25 | 13:50:00 GMT | | OMXS30 | INDEXNASDA | OMX Stockholm 30 Index | 1376.56 | 13:54:00 GMT | | HSI | INDEXHANGS | HANG SENG INDEX | 22523.94 | 08:02:00 GMT | | FIO | NYSE | Fusion-IO, Inc. | 10.72 | 13:50:00 GMT | | INO | NYSEMKT | Inovio Pharmaceuticals Inc | 3.41 | 13:50:00 GMT | | .DJI | INDEXDJX | Dow Jones Industrial Average | 16539.22 | 13:50:00 GMT | | JPM | NYSE | JPMorgan Chase & Co. | 60.44 | 13:50:00 GMT | | JNJ | NYSE | Johnson & Johnson | 97.79 | 13:50:00 GMT | | JNPR | NYSE | Juniper Networks, Inc. | 26.51 | 13:52:00 GMT | | JCP | NYSE | J.C. Penney Company, Inc. | 9 | 13:50:00 GMT | | DIA | NYSEARCA | SPDR Dow Jones Industrial Average ETF | 165.05 | 13:50:00 GMT | | JASO | NASDAQ | JA Solar Holdings Co., Ltd. (ADR) | 10.9299 | 13:50:00 GMT | | VSEAX | MUTF | JPMorgan Small Cap Equity Fund Class A | 45.6 | 20:00:00 GMT | | DATE | NASDAQ | Jiayuan.com International Ltd | 6.86 | 13:48:00 GMT | | JDSU | NASDAQ | JDS Uniphase Corp | 14.5 | 13:50:00 GMT | | EWJ | NYSEARCA | iShares MSCI Japan ETF | 11.395 | 13:50:00 GMT | | JNK | NYSEARCA | SPDR Barclays Capital High Yield Bnd ETF | 41.15 | 13:50:00 GMT | | JKS | NYSE | JinkoSolar Holding Co., Ltd. | 31.57 | 13:50:00 GMT | | JBLU | NASDAQ | JetBlue Airways Corporation | 9.08 | 13:50:00 GMT | | JMBA | NASDAQ | Jamba, Inc. | 12.27 | 13:48:00 GMT | | K | NYSE | Kellogg Company | 62.59 | 13:50:00 GMT | | KO | NYSE | The Coca-Cola Company | 38.45 | 13:50:00 GMT | | GMCR | NASDAQ | Keurig Green Mountain Inc | 110.01 | 13:50:00 GMT | | KERX | NASDAQ | Keryx Biopharmaceuticals | 17.19 | 13:50:00 GMT | | KOG | NYSE | Kodiak Oil & Gas Corp (USA) | 12.13 | 13:54:00 GMT | | KNTH | OTCMKTS | Farpoint Properties Inc | -1 | 00:00:00 GMT | | K | TSE | Kinross Gold Corporation | 4.75 | 13:38:00 GMT | | EKDKQ | OTCMKTS | Eastman Kodak Company | -1 | 00:00:00 GMT | | KMP | NYSE | Kinder Morgan Energy Partners LP | 74.58 | 13:52:00 GMT | | | UNKNOWN EX | KCG Holdings Inc | -1 | 00:00:00 GMT | | KEY | NYSE | KeyCorp | 14.29 | 13:52:00 GMT | | KKD | NYSE | Krispy Kreme Doughnuts | 17.74 | 13:52:00 GMT | | PHG | NYSE | Koninklijke Philips NV (ADR) | 35.26 | 13:52:00 GMT | | KORS | NYSE | Michael Kors Holdings Ltd | 95.83 | 13:52:00 GMT | | KBH | NYSE | KB Home | 17.53 | 13:52:00 GMT | | LVS | NYSE | Las Vegas Sands Corp. | 82.5 | 13:58:00 GMT | | LNKD | NYSE | LinkedIn Corp | 187.98 | 13:52:00 GMT | | ALU | NYSE | Alcatel Lucent SA (ADR) | 4.17 | 13:52:00 GMT | | FMCC | OTCBB | Federal Home Loan Mortgage Corp | 3.94 | 13:40:00 GMT | | MTLQQ | OTCMKTS | Motors Liquidation Co | -1 | 00:00:00 GMT | | ABT | NYSE | Abbott Laboratories | 38.18 | 13:52:00 GMT | | LSI | NASDAQ | LSI Corp | 11.065 | 13:52:00 GMT | | LUV | NYSE | Southwest Airlines Co | 24.14 | 13:52:00 GMT | | OCZTQ | OTCMKTS | ZCO Liquidating Corp | 0.023 | 20:00:00 GMT | | LVLT | NYSE | Level 3 Communications, Inc. | 40.82 | 13:52:00 GMT | | LDKSY | OTCMKTS | LDK Solar Co., Ltd (ADR) | 0.335 | 13:38:00 GMT | | L | NYSE | Loews Corporation | 44.37 | 13:52:00 GMT | | DAL | NYSE | Delta Air Lines, Inc. | 36.23 | 13:52:00 GMT | | LULU | NASDAQ | Lululemon Athletica inc. | 52.8 | 13:52:00 GMT | | LXU | NYSE | LSB Industries, Inc. | 37.98 | 13:52:00 GMT | | F | NYSE | Ford Motor Company | 16.22 | 13:52:00 GMT | | MSFT | NASDAQ | Microsoft Corporation | 41.595 | 13:52:00 GMT | | TSLA | NASDAQ | Tesla Motors Inc | 220.274 | 13:56:00 GMT | | M | NYSE | Macy's, Inc. | 59.66 | 13:52:00 GMT | | AMD | NYSE | Advanced Micro Devices, Inc. | 4.075 | 13:52:00 GMT | | IBM | NYSE | International Business Machines Corp. | 193.46 | 13:52:00 GMT | | WMT | NYSE | Wal-Mart Stores, Inc. | 76.6 | 13:58:00 GMT | | FNMA | OTCBB | Federal National Mortgage Assctn Fnni Me | 3.98 | 13:40:00 GMT | | XOM | NYSE | Exxon Mobil Corporation | 97.43 | 13:52:00 GMT | | PCS | NYSE | T-Mobile Us Inc | -1 | 00:00:00 GMT | | MCD | NYSE | McDonald's Corporation | 98.02 | 13:52:00 GMT | | P | NYSE | Pandora Media Inc | 32.4 | 13:54:00 GMT | | FMCC | OTCBB | Federal Home Loan Mortgage Corp | 3.94 | 13:40:00 GMT | | GM | NYSE | General Motors Company | 34.595 | 13:52:00 GMT | | MU | NASDAQ | Micron Technology, Inc. | 24.235 | 13:52:00 GMT | | .IXIC | INDEXNASDA | NASDAQ Composite | 4282.585 | 13:52:00 GMT | | NFLX | NASDAQ | Netflix, Inc. | 365.609 | 13:52:00 GMT | | NOK | NYSE | Nokia Corporation (ADR) | 7.63 | 13:52:00 GMT | | NVDA | NASDAQ | NVIDIA Corporation | 18.595 | 13:52:00 GMT | | FNMA | OTCBB | Federal National Mortgage Assctn Fnni Me | 3.98 | 13:40:00 GMT | | NATI | NASDAQ | National Instruments Corp | 29.33 | 13:52:00 GMT | | N | NYSE | NetSuite Inc | 95.62 | 13:52:00 GMT | | NLY | NYSE | Annaly Capital Management, Inc. | 10.99 | 13:52:00 GMT | | FFIV | NASDAQ | F5 Networks, Inc. | 110.17 | 13:52:00 GMT | | JNPR | NYSE | Juniper Networks, Inc. | 26.51 | 13:52:00 GMT | | NDX | INDEXNASDA | NASDAQ-100 | 3672.741 | 13:52:00 GMT | | NIFTY | NSE | CNX NIFTY | 6752.55 | 10:02:00 GMT | | NI225 | INDEXNIKKE | Nikkei 225 | 14946.32 | 06:02:00 GMT | | NTAP | NASDAQ | NetApp Inc. | 37.384 | 13:52:00 GMT | | NKE | NYSE | Nike Inc | 74.76 | 13:52:00 GMT | | O | NYSE | Realty Income Corp | 40.6 | 13:52:00 GMT | | ORCL | NYSE | Oracle Corporation | 41.32 | 13:52:00 GMT | | OCZTQ | OTCMKTS | ZCO Liquidating Corp | 0.023 | 20:00:00 GMT | | OMXS30 | INDEXNASDA | OMX Stockholm 30 Index | 1376.56 | 13:54:00 GMT | | USO | NYSEARCA | United States Oil Fund LP (ETF) | 35.79 | 13:56:00 GMT | | DBO | NYSEARCA | PowerShares DB Oil Fund (ETF) | 28.01 | 13:52:00 GMT | | KOG | NYSE | Kodiak Oil & Gas Corp (USA) | 12.13 | 13:54:00 GMT | | NDAQ | NASDAQ | NASDAQ OMX Group, Inc. | 36.46 | 13:54:00 GMT | | COF | NYSE | Capital One Financial Corp. | 76.84 | 13:54:00 GMT | | OUTR | NASDAQ | Outerwall Inc | 72.3 | 13:54:00 GMT | | RWV | NYSEARCA | RevenueShares Navellier Overal A-100 ETF | 50.9238 | 20:00:00 GMT | | OLED | NASDAQ | Universal Display Corporation | 31.891 | 13:52:00 GMT | | UCO | NYSEARCA | ProShares Ultra DJ-UBS Crude Oil | 33.06 | 13:54:00 GMT | | NOV | NYSE | National-Oilwell Varco, Inc. | 78.16 | 13:54:00 GMT | | AEO | NYSE | American Eagle Outfitters | 12.24 | 13:54:00 GMT | | P | NYSE | Pandora Media Inc | 32.4 | 13:54:00 GMT | | QQQ | NASDAQ | PowerShares QQQ Trust, Series 1 (ETF) | 89.4301 | 13:56:00 GMT | | HPQ | NYSE | Hewlett-Packard Company | 33.38 | 13:54:00 GMT | | PCLN | NASDAQ | Priceline.com Inc | 1266.76 | 13:54:00 GMT | | PCS | NYSE | T-Mobile Us Inc | -1 | 00:00:00 GMT | | PFE | NYSE | Pfizer Inc. | 32.03 | 13:54:00 GMT | | UVXY | NYSEARCA | ProShares Ultra Silver (ETF) | 56.35 | 13:56:00 GMT | | SSO | NYSEARCA | ProShares Ultra S&P500 (ETF) | 106.9 | 13:56:00 GMT | | PHYS | NYSEARCA | Sprott Physical Gold Trust | 10.74 | 13:56:00 GMT | | UGL | NYSEARCA | ProShares Ultra Gold (ETF) | 47.09 | 13:54:00 GMT | | PG | NYSE | The Procter & Gamble Company | 80 | 13:54:00 GMT | | | UNKNOWN EX | Perrigo Company PLC | -1 | 00:00:00 GMT | | ARNA | NASDAQ | Arena Pharmaceuticals, Inc. | 6.61 | 13:54:00 GMT | | STPFQ | OTCMKTS | Suntech Power Holdings Co., Ltd. (ADR) | 0.327 | 13:38:00 GMT | | GBP | CURRENCY | British Pound Sterling | -1 | 00:00:00 GMT | | QCOM | NASDAQ | QUALCOMM, Inc. | 79.98 | 13:54:00 GMT | | QQQ | NASDAQ | PowerShares QQQ Trust, Series 1 (ETF) | 89.4301 | 13:56:00 GMT | | QCOR | NASDAQ | Questcor Pharmaceuticals Inc | 69.23 | 13:54:00 GMT | | QIHU | NYSE | Qihoo 360 Technology Co Ltd | 102.18 | 13:54:00 GMT | | QTM | NYSE | Quantum Corp | 1.19 | 13:54:00 GMT | | QLD | NYSEARCA | ProShares Ultra QQQ (ETF) | 103.61 | 13:54:00 GMT | | OBQI | OTCMKTS | Oilsands Quest Inc. | 0.0216 | 20:00:00 GMT | | QLGC | NASDAQ | QLogic Corporation | 12.9 | 13:54:00 GMT | | ZQK | NYSE | Quiksilver, Inc. | 7.83 | 13:54:00 GMT | | DGX | NYSE | Quest Diagnostics Inc | 59.85 | 13:54:00 GMT | | Q | NYSE | Quintiles Transnational Holdings Inc | 50.9 | 13:54:00 GMT | | QNST | NASDAQ | QuinStreet Inc | 6.36 | 13:54:00 GMT | | QTWW | NASDAQ | Quantum Fuel Systems Tech Worldwide Inc | 9.91 | 13:54:00 GMT | | SQM | NYSE | Sociedad Quimica y Minera de Chile (ADR) | 31.03 | 13:54:00 GMT | | KWK | NYSE | Quicksilver Resources Inc | 2.6 | 13:54:00 GMT | | MGM | NYSE | MGM Resorts International | 26.43 | 13:54:00 GMT | | ARR | NYSE | ARMOUR Residential REIT, Inc. | 4.19 | 13:54:00 GMT | | RAD | NYSE | Rite Aid Corporation | 6.43 | 13:54:00 GMT | | IWM | NYSEARCA | iShares Russell 2000 Index (ETF) | 117.95 | 13:54:00 GMT | | R | NYSE | Ryder System, Inc. | 81.54 | 13:54:00 GMT | | RENN | NYSE | Renren Inc | 3.32 | 13:52:00 GMT | | RVBD | NASDAQ | Riverbed Technology, Inc. | 19.9599 | 13:54:00 GMT | | RIG | NYSE | Transocean LTD | 41.47 | 13:54:00 GMT | | RBS | NYSE | Royal Bank of Scotland Group plc (ADR) | 10.69 | 13:54:00 GMT | | CLF | NYSE | Cliffs Natural Resources Inc | 20 | 13:56:00 GMT | | REGN | NASDAQ | Regeneron Pharmaceuticals Inc | 307.92 | 13:56:00 GMT | | FDS | NYSE | FactSet Research Systems Inc. | 107.96 | 13:56:00 GMT | | RAX | NYSE | Rackspace Hosting, Inc. | 32.26 | 13:56:00 GMT | | RHT | NYSE | Red Hat Inc | 53.41 | 13:56:00 GMT | | WYNN | NASDAQ | Wynn Resorts, Limited | 226.39 | 13:56:00 GMT | | .INX | INDEXSP | S&P 500 | 1886.97 | 13:56:00 GMT | | | UNKNOWN EX | Sirius XM Holdings Inc | -1 | 00:00:00 GMT | | SPY | NYSEARCA | SPDR S&P 500 ETF Trust | 188.48 | 13:56:00 GMT | | GLD | NYSEARCA | SPDR Gold Trust (ETF) | 124.63 | 13:56:00 GMT | | LVS | NYSE | Las Vegas Sands Corp. | 82.5 | 13:58:00 GMT | | | UNKNOWN EX | Sprint Communications Inc | -1 | 00:00:00 GMT | | WMT | NYSE | Wal-Mart Stores, Inc. | 76.6 | 13:58:00 GMT | | SLV | NYSEARCA | iShares Silver Trust (ETF) | 19.3 | 13:56:00 GMT | | QQQ | NASDAQ | PowerShares QQQ Trust, Series 1 (ETF) | 89.4301 | 13:56:00 GMT | | FAZ | NYSEARCA | Direxion Daily Financial Bear 3X Shares | 19.27 | 13:56:00 GMT | | GS | NYSE | Goldman Sachs Group Inc | 166.4 | 13:56:00 GMT | | FAS | NYSEARCA | Direxion Daily Financial Bull 3X Shares | 95.72 | 13:56:00 GMT | | SBUX | NASDAQ | Starbucks Corporation | 73.55 | 13:56:00 GMT | | ISRG | NASDAQ | Intuitive Surgical, Inc. | 501.48 | 13:56:00 GMT | | S | NYSE | Sprint Corporation | 9.35 | 13:56:00 GMT | | T | NYSE | AT&T Inc. | 35.29 | 13:56:00 GMT | | TSLA | NASDAQ | Tesla Motors Inc | 220.274 | 13:56:00 GMT | | SPY | NYSEARCA | SPDR S&P 500 ETF Trust | 188.48 | 13:56:00 GMT | | GLD | NYSEARCA | SPDR Gold Trust (ETF) | 124.63 | 13:56:00 GMT | | SLV | NYSEARCA | iShares Silver Trust (ETF) | 19.3 | 13:56:00 GMT | | QQQ | NASDAQ | PowerShares QQQ Trust, Series 1 (ETF) | 89.4301 | 13:56:00 GMT | | PCS | NYSE | T-Mobile Us Inc | -1 | 00:00:00 GMT | | TM | NYSE | Toyota Motor Corp (ADR) | 112.88 | 13:56:00 GMT | | PHYS | NYSEARCA | Sprott Physical Gold Trust | 10.74 | 13:56:00 GMT | | ETFC | NASDAQ | E TRADE Financial Corporation | 23.529 | 13:56:00 GMT | | TIVO | NASDAQ | TiVo Inc. | 13.315 | 13:56:00 GMT | | TGT | NYSE | Target Corporation | 60.62 | 13:56:00 GMT | | TNA | NYSEARCA | Direxion Small Cap Bull 3X Shares (ETF) | 81.22 | 13:56:00 GMT | | TWTR | NYSE | Twitter Inc | 47.02 | 13:56:00 GMT | | | UNKNOWN EX | iPath S&P 500 VIX Short Term Futures TM ETN | -1 | 00:00:00 GMT | | AAPL | NASDAQ | Apple Inc. | 541.792 | 13:46:00 GMT | | AA | NYSE | Alcoa Inc | 12.79 | 13:46:00 GMT |

Google Finance Scraping Spider PHP code

This source code is written in PHP and is ready to be used immediately. You can either make an agreement with us-proxies.com for IP addresses or replace the relevant parts and use your own IP solution. Before using the source code please read the license agreement on top of the source code. Requirements: * PHP 5.2 or higher, PHP libCURL and PHP DOM * user permissions to write at the local directory (caching) * us proxies API support (professional IP provider)

Download the source code here: scrape-google-finance.php functions-sgf.php
scrape-google-finance.php
<?php
    
/* License:
       Open source for private and commercial use.
       This source code is free to use and modify as long as this comment stays untouched on top.
       URL of original source code: http://scrape-google-finance.compunect.com/
       Author of original source code: http://www.compunect.com
       Under no circumstances and under no legal theory, whether in tort (including negligence), contract, or otherwise, shall the Licensor be liable to anyone for any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or the use of the Original Work including, without limitation, damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses. This limitation of liability shall not apply to the extent applicable law prohibits such limitation.
       Exceptions:
       Public redistributing modifications of this source code project is not allowed without written agreement.
       Using this work for private and commercial projects is allowed, redistributing it is not allowed without our written agreement.
       In simple words: You may power your project with this code or a customized version of it, but you may NOT redistribute the code. Also any legal consequences are your own problem.
       You may also use this project on your own risk, any problems or financial losses due to the use of this software at your own risk.

       If want to hire me for customization or a similar project please write an email to develop@compunect.com
     */

    /*
     * The Google finance exchange scraping spider can solve these tasks:
     * 1) It can scrape the majority (or even all) companies from google finance without getting blocked or detected.
     * 2) It can scrape the current or last price of the stock and will add the correct timestamp of this price. (realtime or closing price)
     * 3) The scraper can be operated in various modes, it will work full automated without interaction. It can scrape only specific companies or all companies.
     * This project is designed to be run on a linux console either manually or in background. However, you can also run it somewhere else.
     * The data arrays built can get larger than 200 megabytes, the script will calculate an estimated amount depending on the configuration and uses ini_set() to receive the required memory.
     *
     * Credit: This project is powered by the proxy service us-proxies.com, a professional IP service with API. The API code was partly taken from that site.
     * You may remove the specific parts and use your own IPs, however you will need unshared IPs of high quality.
     */
    /*
     * price delay: Google does not have realtime data of all exchanges, there is a delay of 0 seconds up to 24 hours depending on the exchange (http://www.google.com/intl/en/googlefinance/disclaimer/)
     * Note: Not all companies that can be scraped are still in business or are traded at exchanges. This scraper will first try to get a realtime price, if this fails it will try to get the last closing price of the last day within a month.
     * If this still fails too it will give up and store -1 as price to prevent further scraping errors.
     */

    // Version 1.2

    
error_reporting(E_ALL & ~E_NOTICE);
    require_once 
"functions-sgf.php";

    
// ************************* Configuration variables *************************
    // Your api credentials, you need a plan at us-proxies.com to use this feature
    // You may use another service but this project was built around the us-proxies service, other services will likely require more work
    
$pwd US-PROXIES.COM-API-KEY;
    
$uid US-PROXIES-ACCOUNT-ID;

    
$working_dir './local_cache_sgf';
    
$test_mode='matches_and_details';   // 'matches_and_details' = scrape matches and their detail data (prices,etc)
    //$test_mode='details_only';          // 'details_only' = scrapes only those companies listed in test_companies variable
    //$test_mode='matches_only';          // 'matches_only' = scrape all companies, no details;
    // Usage note: You can first scrape for companies (matches) only, and then use the matches_and_details mode to scrape for details using the matches from local cache. That's the faster approach.
    
if ($test_mode != 'matches_only'$mem_multiplier=2; else $mem_multiplier=1;
    
$test_companies 'NYSE:IBM,GM,GOOG,APPL,005930'// for test_mode=selected ; comma separated list of exchange identifiers. You can use the ID, the simple identifier or exchange:identifier
    
$test_mutate_depth 4;     // for test_mode=all; how deep to spider, 4 letters should reveal the large majority of all available companies (more than half a million)
    // 4 characters are required to get a mostly completed business list but the efficiency is lower for the 4th character

    
$megs=(int)((26^$test_mutate_depth)*145) * $mem_multiplier// in php 5.3 the memory usage could be cut by 2/3 through pre allocated arrays
    
ini_set("memory_limit",$megs."M");  // memory usage can be quite heavy, expect up to 256MB memory usage to scrape all companies and their price tags
    
$test_max_matches=1000;          // max number of companies to scrape for, for example to break during testing. One match contains 0-10 companies
    
$test_max_details=1000;           // max number of company details
    
$test_force_cache=0;            // set -1 to force a refresh of company match information, set to 0 to use the cache normally (default 2 month expiration), set to 1 to use the company cache without expiration checks
    
$test_force_cache_details=0;    // set to -1 to force realtime stock data, set to 0 to allow 1 hour old data (default), set to 1 to use available caches even if data is old

    
$PROXY = array();   // after the rotate api call this variable contains these elements: [address](proxy host),[port](proxy port),[external_ip](the external IP),[ready](0/1)
    
$PLAN = array();    // after the plan api call this variable contains the PLAN details about ip count, processes, protocol, etc
    
$dataset = array(); // this is our main data container it will contain all our results

    
if ($test_mode == 'details_only')
    {
        echo 
"Using predefined keywords\n";
        
$primary_keywords[0] = explode(","$test_companies);
    } else
    {
        echo 
"Generating keywords, this can take some seconds\n";
        
$primary_keywords combine($test_mutate_depth);
    }

    if (!
count($primary_keywords)) die ("Error: no keywords defined/generated, check mutate_depth,test_mode,test_companies parameters.\n");
    if (!
rmkdir($working_dir)) die("Failed to create/open $working_dir\n");

    
$ready get_plan();
    if (!
$ready) die("The specified API credentials for user $uid are not active or invalid. \n");
    if (
$PLAN['protocol'] != "http") die("Wrong proxy protocol configured, switch to HTTP and retry. \n");

    
// Query API to get proper codes and domains for country and language selection
    
$api_finance_data get_api_google_finance(); // has to be global reachable
    
if (!$api_finance_data) die("Invalid country/language specified.\n");


    
$dataset=array();


    
$ch new_curl_session(); // $ch is the cURL handler for our requests


    
$data=get_companies($dataset,$primary_keywords,$api_finance_data,$test_mode,$test_max_matches,$test_max_details);

    
display_results($dataset);


?>
functions-sgf.php
<?php
    
/* License:
       Open source for private and commercial use.
       This source code is free to use and modify as long as this comment stays untouched on top.
       URL of original source code: http://scrape-google-finance.compunect.com/
       Author of original source code: http://www.compunect.com
       Under no circumstances and under no legal theory, whether in tort (including negligence), contract, or otherwise, shall the Licensor be liable to anyone for any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or the use of the Original Work including, without limitation, damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses. This limitation of liability shall not apply to the extent applicable law prohibits such limitation.
       Exceptions:
       Public redistributing modifications of this source code project is not allowed without written agreement.
       Using this work for private and commercial projects is allowed, redistributing it is not allowed without our written agreement.
       In simple words: You may power your project with this code or a customized version of it, but you may NOT redistribute the code. Also any legal consequences are your own problem.
     */

    // just comment out the echo or extend the function to your liking
    
function verbose($text)
    {
        echo 
$text;
    }

    
/*
     * Returns up to date data to use for this scraping.
     * includes: user agent (in future, rotating user agents), google codes, domains, etc.
     * This function will only work with a valid plan at us-proxies.com
     *
     * You can remove the API parts and hardcore the values if you do not wish to get a plan there and use a different service for IPs
     * Otherwise this snippet will allow your code to automatically react to some changes
     */
    
function get_api_google_finance()
    {
        global 
$pwd;
        global 
$uid;
        global 
$PROXY;
        global 
$PLAN;
        global 
$portal;
        
$fp fsockopen("us-proxies.com"80);
        if (!
$fp)
        {
            echo 
"Unable to connect to google_cc API \n";

            return 
NULL// connection not possible
        
} else
        {
            
$plan_size $PLAN['total_ips'];
            
fwrite($fp"GET /g_api.php?api=1&uid=$uid&pwd=$pwd&cmd=google_finance&plan_size=$plan_size HTTP/1.0\r\nHost: us-proxies.com\r\nAccept: text/html, text/plain, text/*, */*;q=0.01\r\nAccept-Encoding: plain\r\nAccept-Language: en\r\n\r\n");
            
stream_set_timeout($fp8);
            
$res "";
            
$n 0;
            while (!
feof($fp))
            {
                if (
$n++ > 4) break;
                
$res .= fread($fp8192);
            }
            
$info stream_get_meta_data($fp);
            
fclose($fp);

            if (
$info['timed_out'])
            {
                echo 
'API: Connection timed out! \n';

                return 
NULL// api timeout
            
} else
            {
                
$data extractBody($res);
                
$obj unserialize($data);
                if (isset(
$obj['error'])) echo $obj['error'] . "\n";
                if (isset(
$obj['info'])) echo $obj['info'] . "\n";

                return 
$obj['data'];

                if (
strlen($data) < 4) return NULL// invalid api response
            
}
        }
    }

    function 
rmkdir($path$mode 0755)
    {
        if (
file_exists($path)) return 1;

        return @
mkdir($path$mode);
    }

    
/* Delay (sleep) based on the license size to allow optimal scraping
     *
     * Warning!
     * Do NOT change the delay to be shorter than the specified delay.
     * This function will create a delay based on your total IP addresses.
     *
     */
    
function delay_time($reason 'ip'$total_threads 1)
    {
        global 
$PLAN;
        global 
$api_finance_data;


        if (
$reason == 'ip')
        {
            
$d $total_threads $api_finance_data['delay_rotate_us'];
            
verbose("\twait.. \n");
        }
        if (
$reason == 'request')
        {
            
$d $api_finance_data['delay_query_us'];
            
verbose("\twait.. \n");
        }
        
usleep($d);
    }


    
/*
     * By default (no force) the function will load cached data within $max_hours hours otherwise reject the cache.
     * The time can be increased to reduce IP usage
     */
    
function load_cache($keyword$api_finance_data$force_cache$cache_type$max_hours)
    {
        global 
$working_dir;

        if (
$force_cache 0) return NULL;

        
$file "$working_dir/$cache_type.$keyword.cache";
        
$now time();
        if (
file_exists($file))
        {
            
$ut filemtime($file);
            
$dif $now $ut;
            
$hour = (int)($dif / (60 60));
            if (
$force_cache || ($dif < (60 60 $max_hours)))
            {
                
$serdata file_get_contents($file);
                
$serp_data unserialize($serdata);
                
verbose("\tusing cache, file age: {$hour}h\n");

                return 
$serp_data;
            }

            return 
NULL;
        } else
        {
            return 
NULL;
        }

    }

    function 
store_cache($data$keyword$api_finance_data$cache_type)
    {
        global 
$working_dir;

        
$file "$working_dir/$cache_type.$keyword.cache";
        
$now time();
        if (
file_exists($file))
        {
            
$ut filemtime($file);
            
$dif $now $ut;
            
//if ($dif < (60 * 60 * 24)) echo "Warning: cache storage initated for $keyword which was already cached within the past 24 hours!\n";
        
}
        
$serdata serialize($data);
        
file_put_contents($file$serdataLOCK_EX);
        
verbose("\tCache: stored file $file for $keyword.\n");
    }


    
// check_ip_usage() must be called before first use of mark_ip_usage()
    
function check_ip_usage()
    {
        global 
$PROXY;
        global 
$working_dir;
        global 
$ip_usage_data// usage data object as array

        
if (!isset($PROXY['ready'])) return 0// proxy not ready/started
        
if (!$PROXY['ready']) return 0// proxy not ready/started

        
if (!isset($ip_usage_data))
        {
            if (!
file_exists($working_dir "/ipdata.obj")) // usage data object as file
            
{
                echo 
"Warning!\n" "The ipdata.obj file was not found, if this is the first usage of the rank checker everything is alright.\n" "Otherwise removal or failure to access the ip usage data will lead to damage of the IP quality.\n\n";
                
sleep(2);
                
$ip_usage_data = array();
            } else
            {
                
$ser_data file_get_contents($working_dir "/ipdata.obj");
                
$ip_usage_data unserialize($ser_data);
            }
        }

        if (!isset(
$ip_usage_data[$PROXY['external_ip']]))
        {
            
//verbose("IP $PROXY[external_ip] is ready for use \n");

            
return 1// the IP was not used yet
        
}
        if (!isset(
$ip_usage_data[$PROXY['external_ip']]['requests'][20]['ut_google']))
        {
            
//verbose("IP $PROXY[external_ip] is ready for use \n");

            
return 1// the IP has not been used 20+ times yet, return true
        
}
        
$ut_last = (int)$ip_usage_data[$PROXY['external_ip']]['ut_last-usage']; // last time this IP was used
        
$req_total = (int)$ip_usage_data[$PROXY['external_ip']]['request-total']; // total number of requests made by this IP
        
$req_20 = (int)$ip_usage_data[$PROXY['external_ip']]['requests'][20]['ut_google']; // the 20th request (if IP was used 20+ times) unixtime stamp

        
$now time();
        if ((
$now $req_20) > (60 60))
        {
            
//verbose("IP $PROXY[external_ip] is ready for use \n");

            
return 1// more than an hour passed since 20th usage of this IP
        
} else
        {
            
$cd_sec = (60 60) - ($now $req_20);
            
verbose("IP $PROXY[external_ip] needs $cd_sec seconds cooldown, not ready for use yet \n");

            return 
0// the IP is overused, it can not be used for scraping without being detected by the search engine yet
        
}
    }

    
/*
     * Updates and stores the ip usage data object
     * Marks an IP as used and re-sorts the access array
     */
    
function mark_ip_usage()
    {
        global 
$PROXY;
        global 
$working_dir;
        global 
$ip_usage_data// usage data object as array

        
if (!isset($ip_usage_data)) die("ERROR: Incorrect usage. check_ip_usage() needs to be called once before mark_ip_usage()!\n");
        
$now time();

        
$ip_usage_data[$PROXY['external_ip']]['ut_last-usage'] = $now// last time this IP was used
        
if (!isset($ip_usage_data[$PROXY['external_ip']]['request-total'])) $ip_usage_data[$PROXY['external_ip']]['request-total'] = 0;
        
$ip_usage_data[$PROXY['external_ip']]['request-total']++; // total number of requests made by this IP
        // shift fifo queue
        
for ($req 19$req >= 1$req--)
        {
            if (isset(
$ip_usage_data[$PROXY['external_ip']]['requests'][$req]['ut_google']))
            {
                
$ip_usage_data[$PROXY['external_ip']]['requests'][$req 1]['ut_google'] = $ip_usage_data[$PROXY['external_ip']]['requests'][$req]['ut_google'];
            }
        }
        
$ip_usage_data[$PROXY['external_ip']]['requests'][1]['ut_google'] = $now;

        
$serdata serialize($ip_usage_data);
        
file_put_contents($working_dir "/ipdata.obj"$serdataLOCK_EX);

    }

    function 
new_curl_session($ch NULL)
    {
        global 
$PROXY;
        global 
$api_finance_data;

        
$default_agent $api_finance_data['default_agent_chrome']; // rotating chrome useragent ?
        
if ((!isset($PROXY['ready'])) || (!$PROXY['ready'])) return $ch// proxy not ready

        
if (isset($ch) && ($ch != NULL))
        {
            
curl_close($ch);
        }
        
$ch curl_init();
        
curl_setopt($chCURLOPT_HEADER0);
        
curl_setopt($chCURLOPT_FOLLOWLOCATION1);
        
curl_setopt($chCURLOPT_RETURNTRANSFER1);
        
$curl_proxy "$PROXY[address]:$PROXY[port]";
        
curl_setopt($chCURLOPT_PROXY$curl_proxy);
        
curl_setopt($chCURLOPT_CONNECTTIMEOUT10);
        
curl_setopt($chCURLOPT_TIMEOUT10);
        
curl_setopt($chCURLOPT_USERAGENT$default_agent);
        return 
$ch;
    }

    function 
getip()
    {
        global 
$PROXY;
        if (!
$PROXY['ready']) return -1// proxy not ready

        
$curl_handle curl_init();
        
curl_setopt($curl_handleCURLOPT_URL'http://ipcheck.ipnetic.com/remote_ip.php'); // returns the real IP
        
curl_setopt($curl_handleCURLOPT_CONNECTTIMEOUT10);
        
curl_setopt($curl_handleCURLOPT_TIMEOUT10);
        
curl_setopt($curl_handleCURLOPT_RETURNTRANSFER1);
        
$curl_proxy "$PROXY[address]:$PROXY[port]";
        
curl_setopt($curl_handleCURLOPT_PROXY$curl_proxy);
        
$tested_ip curl_exec($curl_handle);

        if (
preg_match("^([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}^"$tested_ip))
        {
            
curl_close($curl_handle);

            return 
$tested_ip;
        } else
        {
            
$info curl_getinfo($curl_handle);
            
curl_close($curl_handle);

            return 
0// possible error would be a wrong authentication IP or a firewall
        
}
    }


    
// return 1 if account is ready, otherwise 0
    
function get_plan()
    {
        global 
$PLAN;

        
$res ip_service("plan"); // will fill $PLAN
        
$ip "";
        if (
$res <= 0)
        {
            
verbose("API error: Proxy API connection failed (Error $res). trying again later..\n\n");

            return 
0;
        } else
        {
            (
$PLAN['active'] == 1) ? $ready "active" $ready "not active";
            
verbose("\tAPI success: License is $ready.\n");
            if (
$PLAN['active'] == 1) return 1;

            return 
0;
        }

        return 
$PLAN;
    }

    
/*
        * This is the API function to retrieve US IP addresses
        * This function handles the API calls "plan" and "rotate"
        *
        * Rotate: On success this function will define the global $PROXY variable, adding the elements ready,address,port,external_ip and return 1
        * On failure the return is 0 or smaller and the PROXY variable ready element is set to "0"
        * It is good practice to use the API response in $PROXY instead of hardcoding connection parameters
        *
        * Plan: On success this function will define the global $PLAN variable, adding the elements active, max_ips, total_ips, protocol, processes and return 1
        * It is good practice to make one call to "plan" upon starting your script to find out about the status and size of the plan
        */
    
function extractBody($response_str)
    {
        
$parts preg_split('|(?:\r?\n){2}|m'$response_str2);
        if (isset(
$parts[1])) return $parts[1]; else  return '';
    }

    function 
ip_service($cmd$x "")
    {
        global 
$pwd;
        global 
$uid;
        global 
$PROXY;
        global 
$PLAN;

        
$fp fsockopen("us-proxies.com"80);
        if (!
$fp)
        {
            echo 
"Unable to connect to API \n";

            return -
1// connection not possible
        
} else
        {
            if (
$cmd == "plan")
            {
                
fwrite($fp"GET /api.php?api=1&uid=$uid&pwd=$pwd&cmd=plan&extended=1 HTTP/1.0\r\nHost: us-proxies.com\r\nAccept: text/html, text/plain, text/*, */*;q=0.01\r\nAccept-Encoding: plain\r\nAccept-Language: en\r\n\r\n");

                
stream_set_timeout($fp8);
                
$res "";
                
$n 0;
                while (!
feof($fp))
                {
                    if (
$n++ > 4) break;
                    
$res .= fread($fp8192);
                }
                
$info stream_get_meta_data($fp);
                
fclose($fp);

                if (
$info['timed_out'])
                {
                    echo 
'API: Connection timed out! \n';
                    
$PLAN['active'] = 0;

                    return -
2// api timeout
                
} else
                {
                    if (
strlen($res) > 1000) return -3// invalid api response (check the API website for possible problems)
                    
$data extractBody($res);
                    
$ar explode(":"$data);
                    if (
count($ar) < 4) return -100// invalid api response
                    
switch ($ar[0])
                    {
                        case 
"ERROR":
                            echo 
"API Error: $res \n";
                            
$PLAN['active'] = 0;

                            return 
0// Error received
                            
break;
                        case 
"PLAN":
                            
$PLAN['max_ips'] = $ar[1]; // number of IPs licensed
                            
$PLAN['total_ips'] = $ar[2]; // number of IPs assigned
                            
$PLAN['protocol'] = $ar[3]; // current proxy protocol (http, socks, ..)
                            
$PLAN['processes'] = $ar[4]; // number of available proxy processes
                            
if ($PLAN['total_ips'] > 0$PLAN['active'] = 1; else $PLAN['active'] = 0;

                            return 
1;
                            break;
                        default:
                            echo 
"API Error: Received answer $ar[0], expected \"PLAN\"";
                            
$PLAN['active'] = 0;

                            return -
101// unknown API response
                    
}
                }

            } 
// cmd==plan


            
if ($cmd == "rotate")
            {
                
$PROXY['ready'] = 0;
                
fwrite($fp"GET /api.php?api=1&uid=$uid&pwd=$pwd&cmd=rotate&randomize=0&offset=0 HTTP/1.0\r\nHost: us-proxies.com\r\nAccept: text/html, text/plain, text/*, */*;q=0.01\r\nAccept-Encoding: plain\r\nAccept-Language: en\r\n\r\n");
                
stream_set_timeout($fp8);
                
$res "";
                
$n 0;
                while (!
feof($fp))
                {
                    if (
$n++ > 4) break;
                    
$res .= fread($fp8192);
                }
                
$info stream_get_meta_data($fp);
                
fclose($fp);

                if (
$info['timed_out'])
                {
                    echo 
'API: Connection timed out! \n';

                    return -
2// api timeout
                
} else
                {
                    if (
strlen($res) > 1000) return -3// invalid api response (check the API website for possible problems)
                    
$data extractBody($res);
                    
$ar explode(":"$data);
                    if (
count($ar) < 4) return -100// invalid api response
                    
switch ($ar[0])
                    {
                        case 
"ERROR":
                            echo 
"API Error: $res \n";

                            return 
0// Error received
                            
break;
                        case 
"ROTATE":
                            
$PROXY['address'] = $ar[1];
                            
$PROXY['port'] = $ar[2];
                            
$PROXY['external_ip'] = $ar[3];
                            
$PROXY['ready'] = 1;
                            
usleep(230000); // additional time to avoid connecting during proxy bootup phase, removing/reducing this can cause random connection failures but will increase overall performance for very large plans
                            
return 1;
                            break;
                        default:
                            echo 
"API Error: Received answer $ar[0], expected \"ROTATE\"";

                            return -
101// unknown API response
                    
}
                }
            } 
// cmd==rotate
        
}
    }

    
// obtain a fresh IP through us-proxies.com API
    
function rotate_proxy()
    {
        global 
$PROXY;
        global 
$ch;
        
$max_errors 3;
        
$success 0;
        while (
$max_errors--)
        {
            
$res ip_service("rotate"); // will fill $PROXY
            
$ip "";
            if (
$res <= 0)
            {
                
verbose("API error: Proxy API connection failed (Error $res). trying again soon..\n\n");
                
sleep(21); // retry after a while, maybe a routing failure
            
} else
            {
                
verbose("\tAPI success: Received new private IP\n"); // $PROXY[external_ip] on port $PROXY[port] // reduced message length a bit
                
$success 1;
                break;
            }
        }
        if (
$success)
        {
            
$ch new_curl_session($ch);

            return 
1;
        } else
        {
            return 
"API rotation failed. Check account, firewall and API credentials.\n";
        }
    }

    
// gets company matches for one keyword
    
function scrape_matches($keyword$api_finance_data)
    {
        global 
$ch;

        
$data = array('success' => 0'data' => array());
        
$google_ip $api_finance_data['domain'];
        
$matchtype $api_finance_data['matchtype'];

        
$keyword_enc urlencode($keyword);
        
$url "http://$google_ip/finance/match?matchtype=$matchtype&q=$keyword_enc";


        
curl_setopt($chCURLOPT_URL$url);
        
$htmdata curl_exec($ch);
        if (!
$htmdata)
        {
            
$error curl_error($ch);
            
$info curl_getinfo($ch);
            echo 
"\tError scraping: $error [ $error ]\n";
            
sleep(3);

            return 
$data;
        } else
        {
            if (
strlen($htmdata) < 2)
            {
                
sleep(3);
                echo 
"\tError scraping: empty result\n";

                return 
$data;
            }
        }
        if ((
$data_ar json_decode($htmdatatrue)) !== null)
        {

            if (isset(
$data_ar['matches']))
            {
                
$matches=array();
                foreach (
$data_ar['matches'] as $match)
                {
                    
$tmp=array();
                    
$tmp['short']=$match['t'];
                    
$tmp['title']=$match['n'];
                    
$tmp['market']=$match['e'];
                    
$tmp['id']=$match['id'];
                    
$matches[]=$tmp;
                }
                
$data['success'] = 1;
                
$data['data'] = $matches;
            }
        }

        return 
$data;
    }

    
// wrapper to get a full set of keywords
    
function get_companies(&$dataset$keywords$api_finance_data$test_mode$max_results 0xffffff$max_details 0xffffff)
    {
        global 
$test_force_cache;
        global 
$test_force_cache_details;

        
$rotate_now 0// set to 1 to force a rotation after launch, even if IP is not marked as overused
        
$empty_counter 0// count empty replies
        
$result = array('success' => 0);
        
$start_time=time();
        
$company_detail_count=0;

        
$rcounter 1// used for rotation calls (do not modify!)
        
$counter 0// used for regular reportings
        
foreach ($keywords as $idx => $keys)
            foreach (
$keys as $keyword)
            {
                
verbose("Scraping matches for length $idx and search term '$keyword'\n");
                
$cdata load_cache($keyword$api_finance_data$test_force_cache'matches'24*7*4*2); // default is two month expiration of cache data
                // check IP usage
                
$ip_ready check_ip_usage(); // test if ip has not been used within the critical time
                // obtain new IP if necessary
                
if (!$cdata// omit all of this if we have a cache
                
{
                    if ((!
$ip_ready || $rotate_now))
                    {
                        while (!
$ip_ready || $rotate_now// test if the IP is ready or overused
                        
{
                            
$ok rotate_proxy(); // start/rotate to the IP that has not been started for the longest time, also tests if proxy connection is working
                            
if ($ok != 1)
                            {
                                echo(
"Fatal error: proxy rotation failed:\n $ok\n");
                                
$result['success'] = -1;

                                return 
$result;
                            }
                            
$ip_ready check_ip_usage(); // test if ip has not been used within the critical time
                            
if (!$ip_ready)
                            {
                                echo(
"Fatal error: No fresh IPs left, wait a while and retry or obtain a larger plan. \n"); // proper error handling relies on exclusive use of the plan and rotation randomization == 0
                                
$result['success'] = -2;

                                return 
$result;
                            } else
                            {
                                
$rotate_now 0;
                                
delay_time('ip'); // proper delay
                                
break; // ip rotated successfully
                            
}
                        }
                    } else
                    {
                        
delay_time('request');
                    }
                }


                if (
$cdata)
                {
                    
// we have the data already in cache
                    
$result['success']++;
                    
$dataset['matches'][$keyword] = $cdata['data'];
                } else
                {
                    
// we have to make a live request
                    
$scrape_result scrape_matches($keyword$api_finance_data);
                    if (
$scrape_result['success'] == 1)
                    {
                        if (!(
$rcounter++ % 5)) $rotate_now 1;
                        
$result['success']++;
                        
$result['errors'] = 0;
                        
$dataset['matches'][$keyword] = $scrape_result['data'];
                        
mark_ip_usage(); // store IP usage, this is very important to avoid detection and gray/blacklistings
                        
$cdata['keyword'] = $keyword;
                        
$cdata['type'] = 'match';
                        
$cdata['result_count'] = count($scrape_result['data']);
                        
$cdata['data'] = $scrape_result['data'];
                        
store_cache($cdata$keyword$api_finance_data'matches'); // store results into local cache
                    
} else
                    {
                        if (!(
$empty_counter++ % 5)) $rotate_now 1;
                        
$dataset['matches'][$keyword] = array();
                        if (
$result['errors']++ > 20)
                        {
                            echo 
"More than 20 errors without results in between, hard abort\n";
                            
$result['success'] = -3;
                            return 
$result;
                        }

                    }
                }




                if ((
$test_mode == 'matches_and_details') || ($test_mode == 'details_only') )
                {
                    
// We scrape for all companies and detail pages at the same time
                    
$matches_to_scrape=$dataset['matches'][$keyword];
                    if (
$test_mode == 'details_only')
                    {
                        
$match each($dataset['matches'][$keyword]); $match=$match[1];
                        
$matches_to_scrape=array($match);
                    }

                    foreach (
$matches_to_scrape as $match)
                    {
                        if (
$company_detail_count++ >= $max_details)
                        {
                            echo 
"Finished max detail count of $max_details\n";
                            return 
$result;
                        }
                        
$identification$match['market'].":".$match['short'];
                        
verbose ("\tScraping stockinfo details for $identification ($match[title]) #$match[id]\n");
                        
$cdata load_cache($identification$api_finance_data$test_force_cache_details'stockinfo'3); // 1 hour caching, if you want realtime data use 0 or
                        // check IP usage
                        
$ip_ready check_ip_usage(); // test if ip has not been used within the critical time
                        // obtain new IP if necessary
                        
if (!$cdata// omit all of this if we have a cache
                        
{
                            if ((!
$ip_ready || $rotate_now))
                            {
                                while (!
$ip_ready || $rotate_now// test if the IP is ready or overused
                                
{
                                    
$ok rotate_proxy(); // start/rotate to the IP that has not been started for the longest time, also tests if proxy connection is working
                                    
if ($ok != 1)
                                    {
                                        echo(
"Fatal error: proxy rotation failed:\n $ok\n");
                                        
$result['success'] = -1;

                                        return 
$result;
                                    }
                                    
$ip_ready check_ip_usage(); // test if ip has not been used within the critical time
                                    
if (!$ip_ready)
                                    {
                                        echo(
"Fatal error: No fresh IPs left, wait a while and retry or obtain a larger plan. \n"); // proper error handling relies on exclusive use of the plan and rotation randomization == 0
                                        
$result['success'] = -2;

                                        return 
$result;
                                    } else
                                    {
                                        
$rotate_now 0;
                                        
delay_time('ip'); // proper delay
                                        
break; // ip rotated successfully
                                    
}
                                }
                            } else
                            {
                                
delay_time('request');
                            }
                        }


                        if (
$cdata)
                        {
                            
// we have the data already in cache
                            
$result['success_detail']++;
                            
$identification$match['market'].":".$match['short'];
                            
$dataset['stockinfo'][$identification] = $cdata['data'];
                        } else
                        {
                            
// need to scrape

                            
$scrape_resultget_company_details$match['short'],$match['market'], $api_finance_data'm'); // minute mode
                            
$identification$match['market'].":".$match['short'];
                            if (
$scrape_result['success'] == 0)
                            {

                                
verbose("\t\tScraping realtime price failed, trying to the last price within a month (24h price resolution)\n");
                                
$scrape_resultget_company_details$match['short'],$match['market'], $api_finance_data'd'); // daily mode
                            
}
                            if (
$scrape_result['success'] == 1)
                            {
                                
verbose("\tsuccessfully scraped stock prices for $identification\n");
                                
$rotate_now 1;
                                
$result['success_detail']++;
                                
$result['errors_details'] = 0;
                                
mark_ip_usage(); // store IP usage, this is very important to avoid detection and gray/blacklistings
                                
$cdata['keyword'] = $identification;
                                
$cdata['type'] = 'stockinfo';
                                
$cdata['data'] = $scrape_result['data'];
                                
store_cache($cdata$identification$api_finance_data'stockinfo'); // store results into local cache
                                
$dataset['stockinfo'][$identification]=$scrape_result['data'];
                            } elseif (
$scrape_result['success'] == 0)
                            {
                                
verbose("\tscraping exchange details for $identification failed due to no price data available\n");
                                
$result['errors_details'] = 0;
                                
mark_ip_usage(); // store IP usage, this is very important to avoid detection and gray/blacklistings
                                
$cdata['keyword'] = $identification;
                                
$cdata['type'] = 'stockinfo';
                                
$cdata['data'] = $scrape_result['data'];
                                
store_cache($cdata$identification$api_finance_data'stockinfo'); // store results into local cache
                                
$dataset['stockinfo'][$identification]=$scrape_result['data'];
                            } else
                            {
                                
verbose("\tscraping exchange details for $identification failed due to an error\n");
                                
$cdata['keyword'] = $identification;
                                
$cdata['type'] = 'stockinfo';
                                
$cdata['error'] = 1;
                                
$cdata['data'] = $scrape_result['data'];
                                
store_cache($cdata$identification$api_finance_data'stockinfo'); // store error into local cache
                                
$rotate_now 1;
                                
$dataset['stockinfo'][$identification] = array();
                                if (
$result['errors_details']++ > 10)
                                {
                                    echo 
"More than 10 stockinfo errors without results in between, hard abort due to possible detection\n";
                                    
$result['success'] = -30;
                                    return 
$result;
                                }

                            }
                        } 
// cache else
                        
$identification$match['market'].":".$match['short'];
                        
$age=time()-$cdata['data']['timestamp_price'];verbose("\t\t$match[short] on exchange '{$cdata['data']['exchange']}': USD {$cdata['data']['price']}; Price age:$age\n");



                        echo 
"\n";
                    }
                }

                
$num_results count_results($dataset'matches');
                if (
$num_results >= $max_results)
                {
                    echo 
"reached configured max results, ending..\n";
                    break;
                } else
                {
                    
$spent time()-$start_time;
                    
$time_str="$spent seconds";
                    if (
$spent 3600)
                        
$time_str=(int)($spent/60)." minutes";
                    if (!(
$counter++ % 10)) verbose"\033[1m Time spent: $time_str, matches: $num_results\033[0m\n");
                }

            }

        return 
$result;
    }

    
// get details, fill/.oad cache and dataset, return success or failure
    // on success a new IP rotation and proper delay is required to prevent getting blocked
    // rotation and cache support are included within this function to keep this function usable standalone
    // functions returns the data and success codes but also manages $dataset
    
function get_company_details$id,$ex$api_finance_data$test_detail_accuracy='m')
    {
        global 
$ch;

        
$data = array('success' => 0'data' => array());
        
$google_ip "www.google.com";
        
$i $api_finance_data['get_price_now']['i'];
        
$p $api_finance_data['get_price_now']['p'];
        
$f $api_finance_data['get_price']['f'];

        if (
$test_detail_accuracy == 'd')
        {
            
// Daily accuracy mode, only useful for failed scrapes (closed exchanges, dead companies).
            
$i $api_finance_data['get_price_day']['i'];
            
$p $api_finance_data['get_price_day']['p'];
        }


        
$url "http://$google_ip/finance/getprices?q=$id&x=$ex&i=$i&p=$p&f=$f";

        
curl_setopt($chCURLOPT_URL$url);
        
$htmdata curl_exec($ch);
        if (!
$htmdata)
        {
            
$error curl_error($ch);
            
$info curl_getinfo($ch);
            echo 
"\tError scraping: $error [ $error ]\n";
            
sleep(3);

            return 
$data;
        } else
        {
            if (!
strstr($htmdata,'EXCHANGE'))
            {
                
sleep(3);
                echo 
"\tError scraping: invalid result\n";
                return 
$data;
            }
        }
        
$htmdata=urldecode($htmdata);
        
$regex ='/^EXCHANGE=([^\n$]*).*^INTERVAL=([\d]*).*^a([\d]+).*^([\d]{1,3}?),([\d\.]+)/sm';
        
preg_match($regex$htmdata$results);
        if (isset(
$results[5]))
        {
            
$data['success']=1;
            
$data['data']['exchange']=$results[1]; // stock exchange
            
$data['data']['price']=floatval($results[5]); //live stock closing price
            
$data['data']['timestamp_scrape']=time();
            
$data['data']['timestamp_price']=(int)$results[2]*(int)$results[4]+(int)$results[3]; // calculate the correct timestamp

        
} else
        {
            
$regex '/^EXCHANGE=([^\n$]*).*/sm';
            
preg_match($regex$htmdata$results);
            if (isset(
$results[1]))
            {
                
$data['success']=0// scrape successful but no price information available or no price information in selected accuracy mode found
                
$data['data']['exchange']=$results[1]; // stock exchange
                
$data['data']['price']='-1';
                
$data['data']['timestamp_scrape']=time();
                
$data['data']['timestamp_price']=0;
            } else
                
$data['success']=-1// hard error, response returned did not include the correct data
        
}
        return 
$data;

    }


    
// counts all results
    
function count_results(&$dataset$type='matches')
    {
        
$num 0;
        if (
$type == 'matches')
        {
            foreach (
$dataset['matches'] as $kw => $results$num += count($results);
        }

        return 
$num;
    }


    function 
combine($len$chars=NULL$result='')
    {
        static 
$resultarray=array();
        static 
$max_len=null;
        if (
is_null($max_len))$max_len $len;
//        if (!$resultarray) $resultarray=new SplFixedArray(26^4); // php 5.3, optional memory reduction feature
        
if (!$chars$chars=range('A','Z');
        if (
$len 0)
        {
            if (
$result != ''$resultarray[$max_len-$len][]=$result;
            foreach (
$chars as $char)
            {
                
combine($len 1$chars$result.$char);
            }

        } else
            
$resultarray[$max_len-$len][]=$result;
        if (
$result == '')
            return 
$resultarray;

    }

    function 
display_results(&$dataset)
    {
        
$mask "| %-10.10s | %-10.10s | %-70.70s | %-10.10s | %-15.15s |\n";
        
$separator=str_repeat("-"70);

        echo 
"\nGoogle Finance Exchange Spider results (unsorted)\n";
        
printf($mask'Identifier''Exchange''Company name''Price''Price time');
        
printf($mask$separator$separator,$separator,$separator,$separator);

        foreach (
$dataset['matches'] as $kw => $matches)
        {
            foreach (
$matches as $match)
            {
                
$identification$match['market'].":".$match['short'];
                
$identifier=$match['short'];
                
$company=$match['title'];
                
$exchange='UNKNOWN EXCHANGE';
                
$price='unknown';
                
$timestamp_price='unknown';
                if (isset(
$dataset['stockinfo'][$identification]['exchange'])) $exchange=$dataset['stockinfo'][$identification]['exchange'];
                if (isset(
$dataset['stockinfo'][$identification]['price'])) $price=$dataset['stockinfo'][$identification]['price'];
                if (isset(
$dataset['stockinfo'][$identification]['timestamp_price']))
                {
                    
$timestamp_price=gmdate("H:i:s"$dataset['stockinfo'][$identification]['timestamp_price'])." GMT";

                }

                
printf($mask$identifier$exchange$company$price$timestamp_price);
            }
        }

    }


?>