Question or problem about Python programming:
I am using win32.client in python for converting my .xlsx and .xls file into a .csv. When I execute this code it’s giving an error. My code is:
def convertXLS2CSV(aFile): '''converts a MS Excel file to csv w/ the same name in the same directory''' print "------ beginning to convert XLS to CSV ------" try: import win32com.client, os from win32com.client import constants as c excel = win32com.client.Dispatch('Excel.Application') fileDir, fileName = os.path.split(aFile) nameOnly = os.path.splitext(fileName) newName = nameOnly[0] + ".csv" outCSV = os.path.join(fileDir, newName) workbook = excel.Workbooks.Open(aFile) workbook.SaveAs(outCSV, c.xlCSVMSDOS) # 24 represents xlCSVMSDOS workbook.Close(False) excel.Quit() del excel print "...Converted " + nameOnly + " to CSV" except: print ">>>>>>> FAILED to convert " + aFile + " to CSV!" convertXLS2CSV("G:\\hello.xlsx")
I am not able to find the error in this code. Please help.
How to solve the problem:
Solution 1:
I would use xlrd – it’s faster, cross platform and works directly with the file. One thing to note – it doesn’t work on xlsx files – so you’d have to save your Excel file as xls. Edit: As of version 0.8.0, xlrd
reads both XLS and XLSX files.
import xlrd import csv def csv_from_excel(): wb = xlrd.open_workbook('your_workbook.xls') sh = wb.sheet_by_name('Sheet1') your_csv_file = open('your_csv_file.csv', 'wb') wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL) for rownum in xrange(sh.nrows): wr.writerow(sh.row_values(rownum)) your_csv_file.close()
Solution 2:
I would use pandas
. The computationally heavy parts are written in cython or c-extensions to speed up the process and the syntax is very clean. For example, if you want to turn “Sheet1” from the file “your_workbook.xls” into the file “your_csv.csv”, you just use the top-level function read_excel
and the method to_csv
from the DataFrame
class as follows:
import pandas as pd data_xls = pd.read_excel('your_workbook.xls', 'Sheet1', index_col=None) data_xls.to_csv('your_csv.csv', encoding='utf-8')
Setting encoding='utf-8'
alleviates the UnicodeEncodeError
mentioned in other answers.
Solution 3:
Maybe someone find this ready-to-use piece of code useful. It allows to create CSVs from all spreadsheets in Excel’s workbook.
# -*- coding: utf-8 -*- import xlrd import csv from os import sys def csv_from_excel(excel_file): workbook = xlrd.open_workbook(excel_file) all_worksheets = workbook.sheet_names() for worksheet_name in all_worksheets: worksheet = workbook.sheet_by_name(worksheet_name) with open(u'{}.csv'.format(worksheet_name), 'wb') as your_csv_file: wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL) for rownum in xrange(worksheet.nrows): wr.writerow([unicode(entry).encode("utf-8") for entry in worksheet.row_values(rownum)]) if __name__ == "__main__": csv_from_excel(sys.argv[1])
Solution 4:
I’d use csvkit, which uses xlrd (for xls) and openpyxl (for xlsx) to convert just about any tabular data to csv.
Once installed, with its dependencies, it’s a matter of:
python in2csv myfile > myoutput.csv
It takes care of all the format detection issues, so you can pass it just about any tabular data source. It’s cross-platform too (no win32 dependency).
Solution 5:
First read your excel spreadsheet into pandas, below code will import your excel spreadsheet into pandas as a OrderedDict type which contain all of your worksheet as dataframes. Then simply use worksheet_name as a key to access specific worksheet as a dataframe and save only required worksheet as csv file by using df.to_csv(). Hope this will workout in your case.
import pandas as pd df = pd.read_excel('YourExcel.xlsx', sheet_name=None) df['worksheet_name'].to_csv('YourCsv.csv')
If your Excel file contain only one worksheet then simply use below code:
import pandas as pd df = pd.read_excel('YourExcel.xlsx') df.to_csv('YourCsv.csv')
If someone want to convert all the excel worksheets from single excel workbook to the different csv files, try below code:
import pandas as pd def excelTOcsv(filename): df = pd.read_excel(filename, sheet_name=None) for key, value in df.items(): return df[key].to_csv('%s.csv' %key)
This function is working as a multiple Excel sheet of same excel workbook to multiple csv file converter. Where key is the sheet name and value is the content inside sheet.
Solution 6:
@andi I tested your code, it works great, BUT
In my sheets there’s a column like this
2013-03-06T04:00:00
date and time in the same cell
It gets garbled during exportation, it’s like this in the exported file
41275.0416667
other columns are ok.
csvkit, on the other side, does ok with that column but only exports ONE sheet, and my files have many.
Solution 7:
xlsx2csv is faster than pandas and xlrd.
xlsx2csv -s 0 crunchbase_monthly_.xlsx cruchbase
excel file usually comes with n sheetname.
-s is sheetname index.
then, cruchbase folder will be created, each sheet belongs to xlsx will be converted to a single csv.
p.s. csvkit is awesome too.
Solution 8:
Quoting an answer from Scott Ming, which works with workbook containing multiple sheets:
Here is a python script getsheets.py (mirror), you should install pandas
and xlrd
before you use it.
Run this:
pip3 install pandas xlrd # or `pip install pandas xlrd`
How does it works?
$ python3 getsheets.py -h Usage: getsheets.py [OPTIONS] INPUTFILE Convert a Excel file with multiple sheets to several file with one sheet. Examples: getsheets filename getsheets filename -f csv Options: -f, --format [xlsx|csv] Default xlsx. -h, --help Show this message and exit.
Convert to several xlsx:
$ python3 getsheets.py goods_temp.xlsx Sheet.xlsx Done! Sheet1.xlsx Done! All Done!
Convert to several csv:
$ python3 getsheets.py goods_temp.xlsx -f csv Sheet.csv Done! Sheet1.csv Done! All Done!
getsheets.py
:
# -*- coding: utf-8 -*- import click import os import pandas as pd def file_split(file): s = file.split('.') name = '.'.join(s[:-1]) # get directory name return name def getsheets(inputfile, fileformat): name = file_split(inputfile) try: os.makedirs(name) except: pass df1 = pd.ExcelFile(inputfile) for x in df1.sheet_names: print(x + '.' + fileformat, 'Done!') df2 = pd.read_excel(inputfile, sheetname=x) filename = os.path.join(name, x + '.' + fileformat) if fileformat == 'csv': df2.to_csv(filename, index=False) else: df2.to_excel(filename, index=False) print('\nAll Done!') CONTEXT_SETTINGS = dict(help_option_names=['-h', '--help']) @click.command(context_settings=CONTEXT_SETTINGS) @click.argument('inputfile') @click.option('-f', '--format', type=click.Choice([ 'xlsx', 'csv']), default='xlsx', help='Default xlsx.') def cli(inputfile, format): '''Convert a Excel file with multiple sheets to several file with one sheet. Examples: \b getsheets filename \b getsheets filename -f csv ''' if format == 'csv': getsheets(inputfile, 'csv') else: getsheets(inputfile, 'xlsx') cli()
Solution 9:
We can use Pandas lib of Python to conevert xls file to csv file
Below code will convert xls file to csv file .
import pandas as pd
Read Excel File from Local Path :
df = pd.read_excel("C:/Users/IBM_ADMIN/BU GPA Scorecard.xlsx",sheetname=1)
Trim Spaces present on columns :
df.columns = df.columns.str.strip()
Send Data frame to CSV file which will be pipe symbol delimted and without Index :
df.to_csv("C:/Users/IBM_ADMIN/BU GPA Scorecard csv.csv",sep="|",index=False)
Solution 10:
Python is not the best tool for this task. I tried several approaches in Python but none of them work 100% (e.g. 10% converts to 0.1, or column types are messed up, etc). The right tool here is PowerShell, because it is an MS product (as is Excel) and has the best integration.
Simply download this PowerShell script, edit line 47 to enter the path for the folder containing the Excel files and run the script using PowerShell.