Scikit Learn Machine Learning Tutorial for investing with Python p. 5

preview_player
Показать описание
In this video, we build on the previous machine learning with scikit-learn tutorial, and we're going to be pulling out the specific data point that we're interested in as using as a feature.

Bitcoin donations: 1GV7srgR4NJx4vrk7avCmmVQQrqmv87ty6
Рекомендации по теме
Комментарии
Автор

on mac im getting this error:

IndexError: list index out of range

Gator
Автор

Hello, Harrison.
Thanks for the tutorial. I found that my code can not get all the 'Total Debt/Equity'.
After geting some of the value, it start to throw me an error [IndexError: list index out of range]. 

I checked the html sourcecode .the standard one we searching should be:
<tr><td class="yfnc_tablehead1" width="75%">Total Debt/Equity (mrq):</td><td

BUT.there are some exceptions on those source code :


<td class="yfnc_tablehead1" width="75%">Total Debt/Equity (mrq):</td>
<td


<tr><th scope="row" width="75%">Total Debt/Equity (mrq):</th><td

would you show us some beautiful soup skills to get around it? Thank you .

verykoala
Автор

To access the ticker should it not ne:

ticker = each_dir.split("/")[-1] # -1 is the last field
or
ticker = each_dir.split("\\")[-1]
?

roccococolombo
Автор

For Mac:
ticker =

also to catch error index error:
try:
value =
except IndexError:
value = 'null'
print(ticker+ ":", value)

baeem
Автор

Hi (and thanks for all these really nice tutorials), it seems that there is something iffy with the aapl ticker for the file named 20060203134959.html (and others as well). Using
results in a "list index out of range" error. I did ctrl+U on it, and it seems that the line is cut off after </td>. I did a hack to circumvent, which is:
try:
value =
except Exception as e:
print str(e)
value = float('nan')
but it is not a very good hack since the value should be 0.

kennethnielsen
Автор

You are really for this topic, thanks for your videos.

brosales
Автор

+sentdex, thank you for your tutorials. They have been great introductions to machine learning. As many people have pointed out Yahoo has changed their website (some items we used to scrap from the html are not longer there). I was wondering if with these changes grabbing the visible text on the website would be the better approach for data gathering?

CMAZZONI
Автор

If someone have out of range problem here is some solution:


value = source.split(gather + ':</td><td


Its nessecery to edit it later in notepad++ a bit but in general works XD.

PoradnikiPanaMietka
Автор

To all the people who are getting Index out of range - the reason for that error is HTML markup elements in different html files. so you need to handle multiple conditions while splitting markup data. I have handled scenarios which iI found as below. the below code might not be optimised one but it covers all the scenarios.

def Keystats(gather="Total Debt/Equity (mrq):"):
statspath = path+'\_KeyStats'
stock_list = [x[0] for x in os.walk(statspath)]
for each_dir in stock_list[1:]:
each_file = os.listdir(each_dir)
if(len(each_file)>0):
for lfile in each_file:
date_stamp = datetime.strptime(lfile, '%Y%m%d%H%M%S.html')
unix_time =
filePath = each_dir+'\\'+lfile
fileContent = open(filePath, 'r').read()
fileContentNew = fileContent.split(gather + '</td>')
if(len(fileContentNew)==1):
fileContentNew = + '</th>')
if(len(fileContentNew)==1):
print(filePath)
continue
fileContentNew = fileContentNew[1]

fileContentNew = fileContentNew.split('\n<td
else:
fileContentNew = fileContentNew.split('<td

rudreshgp
Автор

Great set of lectures!
I had an issue, hopefully you can assist me with it.

While parsing the local files, the code picks up the files from srcl (in KeyStats) and proceeds further instead of starting from a (the first file) for no apparent reason. Can't seem to figure out the reason why. I've tried using the same code as the one published on your website, same thing happens.

ArnavArora
Автор

A bit ago I requested a video and I was wondering if they will be at the end of your list soon it was taking screenshots and compiling screen shots into a video also a new request is a screen recorder

AncientEntity
Автор

Love your vids and appreciate them very much.  Is it possible that you can give a 3 second pause between code runs?  I regularly try and pause at just the right time and run mine against yours.  I run Fedora and the ticket = each_dir.split("\\")[1] fails, clearly the difference of running Windoze vs NixNux.  Are you able to point out OS differences in the future?

ianrickey
Автор

Just wondering, can you directly query the Yahoo Finance API for all that data?

peterkanini
Автор

Whenever I run this the values print from "ctas" first instead of "a". After that values from "vz" are printed then "gme" and so on in a random order. Is there a reason it isnt going through each folder in order?

ryanmiller
Автор

I couldn't find the code at webpage in your description. Could you please fix the links?

AtulKakrana
Автор

I didnt get the part of gather..?? can you pls explain me that. ??

jeetkhetan
Автор

Saludos desde México, I fell in love python is much easy

GEEKO
Автор

Hi sentdex, may I know how you scrap all the key statistics and financial information from yahoo into html.format? I unable to find the past key statistics and financial information data in yahoo (e.g: 2016 - 20007). Please teach us to scrap the past financial data from Yahoo in your new video.

ngjun
Автор

For Mac users, ValueError: time data '.DS_Store' does not match format '%Y%m%d%H%M%S.html' is due to Mac OS automatically creating .DS_Store files for each folder. They are hidden but the python script includes them. If you run into this error, all you need to do is delete the .DS_Store file. Search "recursively remove .DS_Store files" for instructions.

danfisher
Автор

Still shows out of range
ticker =
IndexError: list index out of range

hntddt
join shbcf.ru